Skip to content

arislid/final-project-level3-cv-16

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

BoostFormer

Main


๐Ÿ“ฐ Contributors

CV-16์กฐ ๐Ÿ’ก ๋น„์ „๊ธธ์žก์ด ๐Ÿ’ก
NAVER Connect Foundation boostcamp AI Tech 4th

๋ฏผ๊ธฐ ๋ฐ•๋ฏผ์ง€ ์œ ์˜์ค€ ์žฅ์ง€ํ›ˆ ์ตœ๋™ํ˜
revanZX arislid youngjun04 FIN443 choipp

๐Ÿ“ฐ Links

๐Ÿ“ฐ Objective

image

  • SegFormer : ์ž„๋ฒ ๋””๋“œ ๋ฐ ๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ๋ฅผ ์œ„ํ•œ Transformer ๊ธฐ๋ฐ˜ Semantic Segmentation ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”
  • Model driven approach : ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋“ฑ ๊ณ ๋„ํ™”๋œ ํ•™์Šต ๊ธฐ๋ฒ• ๋ฐฐ์ œ ยท ์ˆœ์ˆ˜ ๋ชจ๋ธ๋ง์„ ํ†ตํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • Pruning ๋ฐ quantization ๋“ฑ compression ๋ฐฉ๋ฒ• ๋ฐฐ์ œ : ๋ชจ๋ธ ๋ธ”๋ก ยท ๋ ˆ์ด์–ด ์žฌ์„ค๊ณ„ ๋“ฑ ๊ฒฝ๋Ÿ‰ํ™” ๊ตฌ์กฐ๋ณ€๊ฒฝ ์ง„ํ–‰

๐Ÿ“ฐ Dataset

Tiny-ImageNet ADE20K
Purpose Pre-training Fine-tuning
Num_classes 200 150
Training set 100,000 images 20,210 images
Validation set 10,000 images 2,000 images
|-- ADEChallengeData2016
|   |-- image
|   |   |-- train
|   |   `-- val
|   `-- mask
|       |-- train
|       `-- val
`-- tiny-imagenet-200
    |-- train
    |-- val

๐Ÿ“ฐ Base Model

Segformer

Encoder Decoder
Overlap Patch Embedding MLP Layer (upsampling)
SegFormer Block Concat
Efficient Self-Attention Linear-Fuse
Mix-FFN Classifier

๐Ÿ“ฐ BoostFormer(Ours)

boostformer

Encoder Decoder
Poolin Patch Embedding MLP Layer (upsampling)
PoolFormer Block Weighed Sum
SegFormerV2 Block Classifier
Custom Efficient Self-Attention -
Mix-CFN -

๐Ÿ“ฐ Strategy

image

  • Segformer-B2์™€ custom model ์„ฑ๋Šฅ ๋น„๊ต ๋ฐ Params์™€ Flops ์ธก์ • (util/get_flops_params.py)

๐Ÿ“ฐ Method

1. Patch Embedding

  • NxN Conv๋ฅผ Pooling + 1x1 Conv๋กœ ๋Œ€์ฒด

2. Transformer Block

  • Token Mixer : MHSA ๋Œ€์‹  Pooling์œผ๋กœ feature ์ถ”์ถœ
    • $\hat {F_0}=\mathrm {LayerScale}(\mathrm {Pooling}(F_{in}))+F_{in}$
    • $\hat {F_1}=\mathrm {LayerScale}(\mathrm {MixCFN}(\hat {F_0}))+\hat {F_0}$
  • ๊ธฐ์กด Self Output ๋ชจ๋“ˆ ์‚ญ์ œ
    • $\hat {F_0}=\mathrm {CSA}(F_{in})+F_{in}$
    • $\hat {F_1}=\mathrm {MixCFN}(\hat {F_0})+\hat {F_0}$

3. Attention Layer

  • Pooling์œผ๋กœ K, V ์ฐจ์› ์ถ•์†Œ

    • $K, V=\mathrm {Pooling}(F_C)$
  • 1x1 Convolution ์‚ญ์ œ

    • $\mathrm {Attention}(Q,K,V)=\mathrm {Softmax}({{QK^T}\over {\sqrt {d_{head}}}}V)$

4. FFN

  • ๊ธฐ์กด์˜ Linear(dense) embedding ์—ฐ์‚ฐ์„ 1x1 Conv๋กœ ๋ณ€๊ฒฝ

    • $\hat {F_C}=\mathrm {Conv}_{1 \times 1}(F_C)$
  • 3x3 DWConv๋ฅผ 3x3๊ณผ 5x5 DWConv๋กœ channel-wise๋กœ ๋‚˜๋ˆ„์–ด ์—ฐ์‚ฐ ํ›„ Concat (Mix-CFN)

    • $\hat {F_C}=\mathrm {Conv}_{1 \times 1}(\mathrm {Concat}(\hat {F_1},\hat {F_2}))$

  • Batch-Normalization ์ถ”๊ฐ€

5. Decode Head

  • Stage Features Upsample

  • Weighted Sum ์ ์šฉ


๐Ÿ“ฐ Result

result graph

model Params Flops Accval (%) mIoUval (%)
SegFormer-B2 27.462M 58.576G 66.48 29.84
BoostFormer
(Ours)
17.575M
(-36.00%)
15.826G
(-72.98%)
72.28
(+8.72%)
34.29
(+14.91%)
  • ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ Params 36% ๊ฐ์†Œ, FLOPs 72% ๊ฐ์†Œ, mIoU ์„ฑ๋Šฅ 14% ํ–ฅ์ƒ


๐Ÿ“ฐ Qualitative results on ADE20K

results

๐Ÿ“ฐ Mobile Inference Time Comparison

image

๐Ÿ“ฐ NVIDIA Jetson Nano Time Comparision

image


โš™๏ธ Installation

git clone https://github.com/boostcampaitech4lv23cv3/final-project-level3-cv-16.git

๐Ÿงฐ How to Use

Pretraining (tiny_imagenet)

bash dist_train.sh {์‚ฌ์šฉํ•˜๋Š” gpu ๊ฐœ์ˆ˜} \
    --data-path {tiny_imagenet path} \ # ์ด๋ฆ„์— tiny๊ฐ€ ํฌํ•จ๋˜์–ด์•ผํ•จ
    --output_dir {save dir path} \
    --batch-size {batch size per gpu } # default=128

# example
bash dist_train.sh 4 \
    --data-path /workspace/dataset/tiny_imagenet \
    --output_dir result/mod_segformer/ \
    --batch-size 64

ADE20K fine-tuning

# ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ: /final-project-level3-cv-16
python train.py \
    --data_dir {ADE20K์˜ path} \
    --device 0,1,2,3 \ # ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ์ˆ˜์ • 
    --save_path {saveํ•˜๊ณ ์ž ํ•˜๋Š” dir์˜ path} \ 
    --pretrain {pretrain ๋ชจ๋ธ dir ํ˜น์€ .pth์˜ path} # .pth(pretrain์˜ output), dir(huggingface์˜ ๋ชจ๋ธํ—ˆ๋ธŒ์—์„œ ์ œ๊ณตํ•˜๋Š” ํ˜•ํƒœ)
    --batch_size {batch size} # default=16

Evaluation ์ˆ˜ํ–‰

# phase๋ฅผ ํ†ตํ•ด val ๋˜๋Š” test set ์„ค์ •
python eval.py \ # eval.py ๋‚ด์˜ model์„ ์ •์˜ํ•˜๋Š” ์ฝ”๋“œ ์ˆ˜์ •
    --data_dir {ADE20K์˜ path} \
    --pretrain {pretrain ๋ชจ๋ธ dir์˜ path}

Params, FLOPs ํ™•์ธ

python util/get_flops_params.py \ # get_flops_params.py ๋‚ด์˜ model์„ ์ •์˜ํ•˜๋Š” ์ฝ”๋“œ ์ˆ˜์ •
    --data_dir {ADE20K์˜ path}

๐Ÿ“ฐ Directory Structure

|-- ๐Ÿ—‚ appendix          : ๋ฐœํ‘œ์ž๋ฃŒ ๋ฐ WrapUpReport
|-- ๐Ÿ—‚ segformer         : HuggingFace ๊ธฐ๋ฐ˜ segformer ๋ชจ๋ธ ์ฝ”๋“œ
|-- ๐Ÿ—‚ boostformer       : Segformer ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ ์ฝ”๋“œ
|-- ๐Ÿ—‚ imagenet_pretrain : Tiny-ImageNet encoder ํ•™์Šต์‹œ ์‚ฌ์šฉํ•œ ์ฝ”๋“œ
|-- ๐Ÿ—‚ util              : tools ์ฝ”๋“œ ๋ชจ์Œ
|-- Dockerfile
|-- train.py             : ADE20K Finetuning ์ฝ”๋“œ
|-- eval.py              : ๋ชจ๋ธ Inference ๊ฒฐ๊ณผ ์ถœ๋ ฅ ์ฝ”๋“œ
|-- requirements.txt
`-- README.md

About

final-project-level2-cv-16 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Other 0.3%