vision transformer

Paper Review

Swin Transformer : Hierarchical Vision Transformer using Shifted Windows

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICCV 2021 ABSTRACT ์ €์ž๋“ค์€ ๊ธฐ์กด ํ…์ŠคํŠธ์˜ ๋‹จ์–ด์™€ ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ํ•ด์ƒ๋„์™€์˜ ๋„๋ฉ”์ธ ์ฐจ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž Shifted windows๋กœ ํ‘œํ˜„์ด ๊ณ„์‚ฐ๋˜๋Š” ๊ณ„์ธต์ ์ธ ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Shifted windows ๋ฐฉ์‹์€ self-attention ๊ณ„์‚ฐ์„ ์ค‘์ฒฉ๋˜์ง€ ์•Š๋„๋ก ์ œํ•œํ•จ์œผ๋กœ์จ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ๋™์‹œ์— cross-windows connetion์„ ํ—ˆ์šฉํ•จ์œผ๋กœ์จ ํšจ์œจ์„ฑ์„ ๋†’์ธ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ๋กœ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์—ฐ์„ฑ์„ ๊ฐ€์ง€๋ฉฐ ์ด๋ฏธ์ง€ ํฌ๊ธฐ์™€ ๊ด€๋ จํ•˜์—ฌ ์„ ํ˜• ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. Introduction (a)๋Š” ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ Swin Transformer์˜ ๊ตฌ์กฐ์ด๋‹ค. ๊ฐ ๋กœ์ปฌ ์œˆ๋„์šฐ(๋นจ๊ฐ„์ƒ‰ ํ‘œ์‹œ) ๋‚ด์—์„œ๋งŒ ์ž์ฒด ์ฃผ์˜ ๊ณ„์‚ฐ์œผ๋กœ ์ธํ•ด ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์ž…๋ ฅํ•˜๊ธฐ ์œ„ํ•œ ์„ ํ˜•..

Paper Review

Sparse Token Transformer With Attention Back Tracking

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICLR 2023 ABSTRACT ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ €์ž๋“ค์€ Transformer์˜ attention operations์— ๋Œ€ํ•œ ๋ณต์žก๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ linear layers๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” Token Pruning์— ํฌ์ปค์Šค๋ฅผ ๋งž์ท„๋‹ค. ์ด์ „์— work๋“ค์€ ์ถ”ํ›„ layer์˜ attention์˜ ์˜ํ–ฅ์— ๋Œ€ํ•ด ๊ณ ๋ ค ์—†์ด feed-forward ๋‹จ๊ณ„์—์„œ token์„ ์ œ๊ฑฐํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์ด์Šˆ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ์ตœ์ข… ์˜ˆ์ธก์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ํ† ํฐ์„ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด output์—์„œ input๊นŒ์ง€ ๊ฐ attention์˜ ์ค‘์š”์„ฑ์„ back-tracking ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. NLP์™€ CV์—์„œ์˜ ํšจ์œจ์„ฑ์„ ์‹คํ—˜์ ์œผ๋กœ ์ž…์ฆํ•˜์˜€๋‹ค. Introduction Transformer์˜ Pruning ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ ์ฃผ๋กœ ๋ถˆํ•„์š”ํ•œ ๋ชจ๋ธ ๊ฐ€..

Paper Review

TOKEN MERGING: YOUR VIT BUT FASTER

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ICLR 2023 notable top 5% ABSTRACT ํ›ˆ๋ จํ•  ํ•„์š” ์—†์ด ๊ธฐ์กด ViT ๋ชจ๋ธ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋Š˜๋ฆด์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ• ์ผ๋ฐ˜์ ์ด๊ณ  ๊ฐ€๋ฒผ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์œ ์‚ฌํ•œ ํ† ํฐ๋“ค์„ ์ ์ง„์ ์œผ๋กœ ํ•ฉ์นœ๋‹ค. ToMe(Token Merging)์€ training๋™์•ˆ ์‰ฝ๊ฒŒ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Introduction Transformer์˜ ํ† ํฐ์„ Run-time์— Pruningํ•˜์—ฌ ๋” ๋น ๋ฅธ ๋ชจ๋ธ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ถ„์•ผ๊ฐ€ ๋“ฑ์žฅ Token Pruning์˜ ๋Œ€๋ถ€๋ถ„์€ Training ์†๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ ์šฉํ•  ์ˆ˜ ์—†๋‹ค. ๋”ฐ๋ผ์„œ Token์„ Pruningํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ๋” ๋‚˜์€ ๋ฐฉ์‹์ธ combine์„ ์ ์šฉํ•˜์—ฌ Token Merging ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ์˜ custome matching algorithm..

velpegor
'vision transformer' ํƒœ๊ทธ์˜ ๊ธ€ ๋ชฉ๋ก