Skip to content

choipp/vit-adapter-ws

Repository files navigation

Applying ViT-Adapter to Semantic Segmentation

Our segmentation code is developed on top of MMSegmentation v0.20.2.

For details see Vision Transformer Adapter for Dense Predictions.

If you use this code for a paper please cite:

@article{chen2022vitadapter,
  title={Vision Transformer Adapter for Dense Predictions},
  author={Chen, Zhe and Duan, Yuchen and Wang, Wenhai and He, Junjun and Lu, Tong and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.08534},
  year={2022}
}

Usage

Install MMSegmentation v0.20.2.

# recommended environment: torch1.9 + cuda11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # for Mask2Former
pip install mmsegmentation==0.20.2
ln -s ../detection/ops ./
cd ops & sh make.sh # compile deformable attention

Data Preparation

Preparing ADE20K/Cityscapes/COCO Stuff/Pascal Context according to the guidelines in MMSegmentation.

Pre-training Sources

Name Year Type Data Repo Paper
DeiT 2021 Supervised ImageNet-1K repo paper
AugReg 2021 Supervised ImageNet-22K repo paper
BEiT 2021 MIM ImageNet-22K repo paper
Uni-Perceiver 2022 Supervised Multi-Modal repo paper
BEiTv2 2022 MIM ImageNet-22K repo paper

Results and Models

Note that due to the capacity limitation of GitHub Release, some files are provided as .zip packages. Please unzip them before load into model.

ADE20K val

Method Backbone Pre-train Lr schd Crop Size mIoU (SS) mIoU (MS) #Param Config Download
UperNet ViT-Adapter-T DeiT-T 160k 512 42.6 43.6 36M config model | log
UperNet ViT-Adapter-S DeiT-S 160k 512 46.2 47.1 58M config model | log
UperNet ViT-Adapter-B DeiT-B 160k 512 48.8 49.7 134M config model | log
UperNet ViT-Adapter-T AugReg-T 160k 512 43.9 44.8 36M config model | log
UperNet ViT-Adapter-B AugReg-B 160k 512 51.9 52.5 134M config model | log
UperNet ViT-Adapter-L AugReg-L 160k 512 53.4 54.4 364M config model | log
UperNet ViT-Adapter-L Uni-Perceiver-L 160k 512 55.0 55.4 364M config model | log
UperNet ViT-Adapter-L BEiT-L 160k 640 58.0 58.4 451M config model | log
Mask2Former ViT-Adapter-L BEiT-L 160k 640 58.3 59.0 568M config model | log
Mask2Former ViT-Adapter-L BEiT-L+COCO-Stuff 80k 896 59.4 60.5 571M config model | log
Mask2Former ViT-Adapter-L BEiTv2-L+COCO-Stuff 80k 896 61.2 61.5 571M config model | log

Cityscapes val

Method Backbone Pre-train Lr schd Crop Size mIoU (SS) mIoU (MS) #Param Config Download
Mask2Former ViT-Adapter-L Mapillary 80k 896 84.9 85.8 571M config model | log

COCO-Stuff-10K

Method Backbone Pre-train Lr schd Crop Size mIoU (SS) mIoU (MS) #Param Config Download
Mask2Former ViT-Adapter-B BEiT-B 40k 512 50.0 50.5 120M config model | log
UperNet ViT-Adapter-L BEiT-L 80k 512 51.0 51.4 451M config model | log
Mask2Former ViT-Adapter-L BEiT-L 40k 512 53.2 54.2 568M config model | log

COCO-Stuff-164K

Method Backbone Pre-train Lr schd Crop Size mIoU (SS) mIoU (MS) #Param Config Download
UperNet ViT-Adapter-L BEiT-L 80k 640 50.5 50.7 451M config model | log
Mask2Former ViT-Adapter-L BEiT-L 80k 896 51.7 52.0 571M config model | log
Mask2Former ViT-Adapter-L BEiTv2-L 80k 896 52.3 - 571M config model | log

Pascal Context

Method Backbone Pre-train Lr schd Crop Size mIoU (SS) mIoU (MS) #Param Config Download
Mask2Former ViT-Adapter-B BEiT-B 40k 480 64.0 64.4 120M config model |  log
UperNet ViT-Adapter-L BEiT-L 80k 480 67.0 67.5 451M config model |  log
Mask2Former ViT-Adapter-L BEiT-L 40k 480 67.8 68.2 568M config model |  log

Evaluation

To evaluate ViT-Adapter-L + Mask2Former (896) on ADE20k val on a single node with 8 gpus run:

sh dist_test.sh configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py /path/to/checkpoint_file 8 --eval mIoU

This should give

Summary:

+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 86.61 | 59.43 | 73.55 |
+-------+-------+-------+

Training

To train ViT-Adapter-L + UperNet on ADE20k on a single node with 8 gpus run:

sh dist_train.sh configs/ade20k/upernet_beit_adapter_large_640_160k_ade20k_ss.py 8

Image Demo

To inference a single image like this:

CUDA_VISIBLE_DEVICES=0 python image_demo.py \
  configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py  \
  released/mask2former_beit_adapter_large_896_80k_ade20k.pth.tar  \
  data/ade/ADEChallengeData2016/images/validation/ADE_val_00000591.jpg \
  --palette ade20k 

The result will be saved at demo/ADE_val_00000591.jpg. image

Video Demo

To inference a single video like this:

CUDA_VISIBLE_DEVICES=0 python video_demo.py demo.mp4 \
  configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py  \
  released/mask2former_beit_adapter_large_896_80k_ade20k.pth.tar  \
  --output-file results.mp4  \
  --palette ade20k 
results.mp4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published