๋…ผ๋ฌธ๋ฆฌ๋ทฐ

Paper Review

Swin Transformer : Hierarchical Vision Transformer using Shifted Windows

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICCV 2021 ABSTRACT ์ €์ž๋“ค์€ ๊ธฐ์กด ํ…์ŠคํŠธ์˜ ๋‹จ์–ด์™€ ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ํ•ด์ƒ๋„์™€์˜ ๋„๋ฉ”์ธ ์ฐจ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž Shifted windows๋กœ ํ‘œํ˜„์ด ๊ณ„์‚ฐ๋˜๋Š” ๊ณ„์ธต์ ์ธ ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Shifted windows ๋ฐฉ์‹์€ self-attention ๊ณ„์‚ฐ์„ ์ค‘์ฒฉ๋˜์ง€ ์•Š๋„๋ก ์ œํ•œํ•จ์œผ๋กœ์จ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๋™์‹œ์— cross-windows connetion์„ ํ—ˆ์šฉํ•จ์œผ๋กœ์จ ํšจ์œจ์„ฑ์„ ๋†’์ธ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ๋กœ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์—ฐ์„ฑ์„ ๊ฐ€์ง€๋ฉฐ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์™€ ๊ด€๋ จํ•˜์—ฌ ์„ ํ˜• ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. Introduction (a)๋Š” ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ Swin Transformer์˜ ๊ตฌ์กฐ์ด๋‹ค. ๊ฐ ๋กœ์ปฌ ์œˆ๋„์šฐ(๋นจ๊ฐ„์ƒ‰ ํ‘œ์‹œ) ๋‚ด์—์„œ๋งŒ ์ž์ฒด ์ฃผ์˜ ๊ณ„์‚ฐ์œผ๋กœ ์ธํ•ด ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์ž…๋ ฅํ•˜๊ธฐ ์œ„ํ•œ ์„ ํ˜•..

Paper Review

Sparse Token Transformer With Attention Back Tracking

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICLR 2023 ABSTRACT ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์€ Transformer์˜ attention operations์— ๋Œ€ํ•œ ๋ณต์žก๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ linear layers๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” Token Pruning์— ํฌ์ปค์Šค๋ฅผ ๋งž์ท„๋‹ค. ์ด์ „์— work๋“ค์€ ์ถ”ํ›„ layer์˜ attention์˜ ์˜ํ–ฅ์— ๋Œ€ํ•ด ๊ณ ๋ ค ์—†์ด feed-forward ๋‹จ๊ณ„์—์„œ token์„ ์ œ๊ฑฐํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์ด์Šˆ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ์ตœ์ข… ์˜ˆ์ธก์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ํ† ํฐ์„ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด output์—์„œ input๊นŒ์ง€ ๊ฐ attention์˜ ์ค‘์š”์„ฑ์„ back-tracking ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. NLP์™€ CV์—์„œ์˜ ํšจ์œจ์„ฑ์„ ์‹คํ—˜์ ์œผ๋กœ ์ž…์ฆํ•˜์˜€๋‹ค. Introduction Transformer์˜ Pruning ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ ์ฃผ๋กœ ๋ถˆํ•„์š”ํ•œ ๋ชจ๋ธ ๊ฐ€..

Paper Review

TOKEN MERGING: YOUR VIT BUT FASTER

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICLR 2023 notable top 5% ABSTRACT ํ›ˆ๋ จํ•  ํ•„์š” ์—†์ด ๊ธฐ์กด ViT ๋ชจ๋ธ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋Š˜๋ฆด์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ• ์ผ๋ฐ˜์ ์ด๊ณ  ๊ฐ€๋ฒผ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์œ ์‚ฌํ•œ ํ† ํฐ๋“ค์„ ์ ์ง„์ ์œผ๋กœ ํ•ฉ์นœ๋‹ค. ToMe(Token Merging)์€ training๋™์•ˆ ์‰ฝ๊ฒŒ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Introduction Transformer์˜ ํ† ํฐ์„ Run-time์— Pruningํ•˜์—ฌ ๋” ๋น ๋ฅธ ๋ชจ๋ธ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ถ„์•ผ๊ฐ€ ๋“ฑ์žฅ Token Pruning์˜ ๋Œ€๋ถ€๋ถ„์€ Training ์†๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ ์šฉํ•  ์ˆ˜ ์—†๋‹ค. ๋”ฐ๋ผ์„œ Token์„ Pruningํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ๋” ๋‚˜์€ ๋ฐฉ์‹์ธ combine์„ ์ ์šฉํ•˜์—ฌ Token Merging ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ์˜ custome matching algorithm..

Paper Review

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ABSTRACT ์ด์ „์˜ ์—ฐ๊ตฌ๋“ค์€ Convolutional neural network์—์„œ "smaller-norm-less-important" ๊ธฐ์ค€์ด prune filter์— ์ ์šฉ๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ, norm-based ๊ธฐ์ค€์„ ๋ถ„์„ํ•˜๊ณ , ๋‘ ๊ฐ€์ง€ ์š”๊ตฌ์‚ฌํ•ญ์ด ํ•ญ์ƒ ์ถฉ์กฑํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์ง€์ ํ•œ๋‹ค. 1) Filter๋“ค์˜ ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ์ปค์•ผํ•œ๋‹ค. 2) Filter์˜ minimum-norm(์ตœ์†Œ ํ‘œ์ค€)์€ ์ž‘์•„์•ผํ•œ๋‹ค. -> 0์— ๊ฐ€๊นŒ์›Œ์•ผ ํ•œ๋‹ค. ์œ„ ๋‘๊ฐ€์ง€ ์š”๊ตฌ ์‚ฌํ•ญ์— ๊ด€๊ณ„์—†์ด ๋ชจ๋ธ์„ ์••์ถ•ํ•˜๋Š” ์ƒˆ๋กœ์šด Filter Pruning ๋ฐฉ๋ฒ•(Filter Pruning via Gemotric Median, FPGM)์„ ์ œ์•ˆํ•œ๋‹ค. FPGM์€ ์ค‘๋ณต ํ•„ํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ Pruning์„ ์ง„ํ–‰ํ•œ๋‹ค. ResNet101 ๊ธฐ์ค€, CI..

Paper Review

Convolutional Neural Network Pruning: A Survey

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ABSTRACT Deep Convolutional neural networks๋Š” ์ง€๋‚œ ๋ช‡ ๋…„๋™์•ˆ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋ฐœ์ „์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค. Deep Convolutional neural networks์€ ๋งŽ์€ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ float operation์œผ๋กœ ์ธํ•ด ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ Convolutional neural network์˜ Pruning์ž‘์—…์— ๊ด€์‹ฌ์ด ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค. Pruning ๋ฐฉ๋ฒ•, Training ์ „๋žต, ์ถ”์ • ๊ธฐ์ค€์˜ 3๊ฐ€์ง€ ์ฐจ์›์— ๋”ฐ๋ผ ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ๋‹ค. Key Words : Convolutional neural networks, machine intelligence, pruning method, training strategy, estimation citer..

Paper Review

Understanding the difficulty of training deep feedforward neural networks

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ABSTRACT Random Initialization์„ ์‚ฌ์šฉํ•œ ์ผ๋ฐ˜์ ์ธ Gradient-descent ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด Deep neural network์—์„œ ์•ฝํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š”๊ฐ€ Random Initialization์„ ์ ์šฉํ•œ Logistic sigmoid ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ํ‰๊ท ๊ฐ’ ๋•Œ๋ฌธ์— Deep network์— ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ์ƒ์œ„ layer๋ฅผ ํฌํ™”(saturation)ํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒ๋‹นํžˆ ๋น ๋ฅธ ์ˆ˜๋ ด์„ ๊ฐ€์ ธ์˜ค๋Š” ์ƒˆ๋กœ์šด Initialization Scheme๋ฅผ ๋„์ž…ํ•œ๋‹ค. Deep Neural Networks ๋”ฅ๋Ÿฌ๋‹์€ ์ถ”์ถœํ•œ ํŠน์ง•์„ ์ด์šฉํ•˜์—ฌ ํŠน์ง• ๊ณ„์ธต์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜์—ฌ ์ง„ํ–‰ํ•œ๋‹ค. ์ถ”์ถœํ•œ ํŠน์ง• : ๋‚ฎ์€ ์ˆ˜์ค€์˜ Feature๋“ค์˜ ํ•ฉ์„ฑ์„ ํ†ตํ•ด ๋งŒ๋“ค์–ด์ง„ ๋†’์€ ์ˆ˜์ค€์˜ Layer๋กœ ๋ถ€ํ„ฐ ์ถ”..

Paper Review

Batch Normalization : Accelerating Deep Network Training byReducing Internal Covariate Shift

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ABSTRACT DNN์˜ ํ›ˆ๋ จ์€ ์ด์ „ layer์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ๋ณ€๊ฒฝ๋จ์— ๋”ฐ๋ผ ํ›ˆ๋ จ ์ค‘์— ๊ฐ layer์˜ ์ธํ’‹ ๋ถ„ํฌ๊ฐ€ ๋ณ€๊ฒฝ๋œ๋‹ค ์ด๋Š” ๋‚ฎ์€ learning rate์™€ ์‹ ์ค‘ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ดˆ๊ธฐํ™”๋ฅผ ์š”๊ตฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ›ˆ๋ จ ์†๋„๋ฅผ ๋Šฆ์ถ”๊ณ  ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€์ง„ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค ์ด ๋ฌธ์ œ๋ฅผ layer์˜ ์ž…๋ ฅ์˜ normalization์„ ํ†ตํ•ด ํ•ด๊ฒฐํ•œ๋‹ค. ๊ฐ mini batch์— ๋Œ€ํ•œ normalization์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์—์„œ ๋งŽ์€ ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ๋ฐฐ์น˜ normalization์„ ํ†ตํ•ด ํ›จ์”ฌ ๋” ๋†’์€ ํ•™์Šต ์†๋„๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์ดˆ๊ธฐํ™”์— ๋œ ๋ฏผ๊ฐํ•˜๋‹ค. Introduction ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(SGD)๋Š” ์‹ฌ์ธต ๋„คํŠธ์›Œํฌ๋ฅผ ํ›ˆ๋ จํ•˜๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ์ž…์ฆ๋˜์—ˆ๋‹ค. SGD๋Š” ํ•™์Šต์„ ๊ฐ ๋‹จ๊ณ„๋ณ„๋กœ ์ง„ํ–‰ํ•˜๋ฉฐ, ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ..

Paper Review

Deep Residual Learning for Image Recognition

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ABSTRACT ์ด์ „์˜ ํ•™์Šต ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊นŠ์€ ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต์„ ์ข€ ๋” ์šฉ์ดํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. Residual networks๊ฐ€ ์ตœ์ ํ™”ํ•˜๊ธฐ ๋” ์‰ฝ๊ณ , Depth๊ฐ€ ์ฆ๊ฐ€๋œ ๋ชจ๋ธ์—์„œ๋„ ์ƒ๋‹นํžˆ ์ฆ๊ฐ€๋œ ์ •ํ™•๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. INTRODUCTION Is learning better networks as easy as stacking more layers? ๋” ๋‚˜์€ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๋” ๋งŽ์€ ๊ณ„์ธต์„ ์Œ“๋Š” ๊ฒƒ๋งŒํผ ์‰ฌ์šด๊ฐ€? ์œ„ ๊ทธ๋ฆผ์—์„œ layer๊ฐ€ ๋” ๊นŠ์€ ๋นจ๊ฐ„์ƒ‰์ด error๊ฐ€ ๋” ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก gradient๊ฐ€ vanishing/exploding ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋Š” normalized initialization, b..

velpegor
'๋…ผ๋ฌธ๋ฆฌ๋ทฐ' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก