Applying ViT-Adapter to Semantic Segmentation

Our segmentation code is developed on top of MMSegmentation v0.20.2.

For details see Vision Transformer Adapter for Dense Predictions.

If you use this code for a paper please cite:

@article{chen2022vitadapter,
  title={Vision Transformer Adapter for Dense Predictions},
  author={Chen, Zhe and Duan, Yuchen and Wang, Wenhai and He, Junjun and Lu, Tong and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.08534},
  year={2022}
}

Usage

Install MMSegmentation v0.20.2.

# recommended environment: torch1.9 + cuda11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0 # for Mask2Former
pip install mmsegmentation==0.20.2
ln -s ../detection/ops ./
cd ops & sh make.sh # compile deformable attention

Data Preparation

Preparing ADE20K/Cityscapes/COCO Stuff/Pascal Context according to the guidelines in MMSegmentation.

Pre-training Sources

Name	Year	Type	Data	Repo	Paper
DeiT	2021	Supervised	ImageNet-1K	repo	paper
AugReg	2021	Supervised	ImageNet-22K	repo	paper
BEiT	2021	MIM	ImageNet-22K	repo	paper
Uni-Perceiver	2022	Supervised	Multi-Modal	repo	paper
BEiTv2	2022	MIM	ImageNet-22K	repo	paper

Results and Models

Note that due to the capacity limitation of GitHub Release, some files are provided as .zip packages. Please unzip them before load into model.

ADE20K val

Method	Backbone	Pre-train	Lr schd	Crop Size	mIoU (SS)	mIoU (MS)	#Param	Config	Download
UperNet	ViT-Adapter-T	DeiT-T	160k	512	42.6	43.6	36M	config	model \| log
UperNet	ViT-Adapter-S	DeiT-S	160k	512	46.2	47.1	58M	config	model \| log
UperNet	ViT-Adapter-B	DeiT-B	160k	512	48.8	49.7	134M	config	model \| log
UperNet	ViT-Adapter-T	AugReg-T	160k	512	43.9	44.8	36M	config	model \| log
UperNet	ViT-Adapter-B	AugReg-B	160k	512	51.9	52.5	134M	config	model \| log
UperNet	ViT-Adapter-L	AugReg-L	160k	512	53.4	54.4	364M	config	model \| log
UperNet	ViT-Adapter-L	Uni-Perceiver-L	160k	512	55.0	55.4	364M	config	model \| log
UperNet	ViT-Adapter-L	BEiT-L	160k	640	58.0	58.4	451M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiT-L	160k	640	58.3	59.0	568M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiT-L+COCO-Stuff	80k	896	59.4	60.5	571M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiTv2-L+COCO-Stuff	80k	896	61.2	61.5	571M	config	model \| log

Cityscapes val

Method	Backbone	Pre-train	Lr schd	Crop Size	mIoU (SS)	mIoU (MS)	#Param	Config	Download
Mask2Former	ViT-Adapter-L	Mapillary	80k	896	84.9	85.8	571M	config	model \| log

COCO-Stuff-10K

Method	Backbone	Pre-train	Lr schd	Crop Size	mIoU (SS)	mIoU (MS)	#Param	Config	Download
Mask2Former	ViT-Adapter-B	BEiT-B	40k	512	50.0	50.5	120M	config	model \| log
UperNet	ViT-Adapter-L	BEiT-L	80k	512	51.0	51.4	451M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiT-L	40k	512	53.2	54.2	568M	config	model \| log

COCO-Stuff-164K

Method	Backbone	Pre-train	Lr schd	Crop Size	mIoU (SS)	mIoU (MS)	#Param	Config	Download
UperNet	ViT-Adapter-L	BEiT-L	80k	640	50.5	50.7	451M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiT-L	80k	896	51.7	52.0	571M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiTv2-L	80k	896	52.3	-	571M	config	model \| log

Pascal Context

Method	Backbone	Pre-train	Lr schd	Crop Size	mIoU (SS)	mIoU (MS)	#Param	Config	Download
Mask2Former	ViT-Adapter-B	BEiT-B	40k	480	64.0	64.4	120M	config	model \| log
UperNet	ViT-Adapter-L	BEiT-L	80k	480	67.0	67.5	451M	config	model \| log
Mask2Former	ViT-Adapter-L	BEiT-L	40k	480	67.8	68.2	568M	config	model \| log

Evaluation

To evaluate ViT-Adapter-L + Mask2Former (896) on ADE20k val on a single node with 8 gpus run:

sh dist_test.sh configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py /path/to/checkpoint_file 8 --eval mIoU

This should give

Summary:

+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 86.61 | 59.43 | 73.55 |
+-------+-------+-------+

Training

To train ViT-Adapter-L + UperNet on ADE20k on a single node with 8 gpus run:

sh dist_train.sh configs/ade20k/upernet_beit_adapter_large_640_160k_ade20k_ss.py 8

Image Demo

To inference a single image like this:

CUDA_VISIBLE_DEVICES=0 python image_demo.py \
  configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py  \
  released/mask2former_beit_adapter_large_896_80k_ade20k.pth.tar  \
  data/ade/ADEChallengeData2016/images/validation/ADE_val_00000591.jpg \
  --palette ade20k

The result will be saved at demo/ADE_val_00000591.jpg.

Video Demo

To inference a single video like this:

CUDA_VISIBLE_DEVICES=0 python video_demo.py demo.mp4 \
  configs/ade20k/mask2former_beit_adapter_large_896_80k_ade20k_ss.py  \
  released/mask2former_beit_adapter_large_896_80k_ade20k.pth.tar  \
  --output-file results.mp4  \
  --palette ade20k

results.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
mmcv_custom		mmcv_custom
mmseg_custom		mmseg_custom
README.md		README.md
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
get_flops.py		get_flops.py
image_demo.py		image_demo.py
ops		ops
slurm_test.sh		slurm_test.sh
slurm_train.sh		slurm_train.sh
test.py		test.py
train.py		train.py
video_demo.py		video_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applying ViT-Adapter to Semantic Segmentation

Usage

Data Preparation

Pre-training Sources

Results and Models

Evaluation

Training

Image Demo

Video Demo

About

Releases

Packages

Languages

choipp/vit-adapter-ws

Folders and files

Latest commit

History

Repository files navigation

Applying ViT-Adapter to Semantic Segmentation

Usage

Data Preparation

Pre-training Sources

Results and Models

Evaluation

Training

Image Demo

Video Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages