Name		Name	Last commit message	Last commit date
parent directory ..
DETR-with-iRPE		DETR-with-iRPE
DeiT-with-iRPE		DeiT-with-iRPE
HOW_TO_EQUIP_iRPE.md		HOW_TO_EQUIP_iRPE.md
README.md		README.md
iRPE.png		iRPE.png

README.md

Hiring research interns for neural architecture search projects: houwen.peng@microsoft.com

Rethinking and Improving Relative Position Encoding for Vision Transformer

[Paper]

Image RPE (iRPE for short) methods are new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling as well as the interactions between queries and relative position embeddings in self-attention mechanism. The proposed iRPE methods are simple and lightweight, being easily plugged into transformer blocks. Experiments demonstrate that solely due to the proposed encoding methods, DeiT and DETR obtain up to 1.5% (top-1 Acc) and 1.3% (mAP) stable improvements over their original versions on ImageNet and COCO respectively, without tuning any extra hyperparamters such as learning rate and weight decay. Our ablation and analysis also yield interesting findings, some of which run counter to previous understanding.

We provide the implementation of image RPE (iRPE) for image classficiation and object detection.

How to equip iRPE ?

The detail is shown in Tutorial.

Image Classification

[Code]

We equip DeiT models with contextual product shared-head RPE with 50 buckets, and report their accuracy on ImageNet-1K Validation set.

Resolution: 224 x 224

Model	RPE-Q	RPE-K	RPE-V	#Params(M)	MACs(M)	Top-1 Acc.(%)	Top-5 Acc.(%)	Link	Log
tiny		✔		5.76	1284	73.7	92.0	link	log, detail
small		✔		22.09	4659	80.9	95.4	link	log, detail
small	✔	✔		22.13	4706	81.0	95.5	link	log, detail
small	✔	✔	✔	22.17	4885	81.2	95.5	link	log, detail
base		✔		86.61	17684	82.3	95.9	link	log, detail
base	✔	✔	✔	86.68	18137	82.8	96.1	link	log, detail

Object Detection

[Code]

We equip DETR models with contextual product shared-head RPE, and report their mAP on MS COCO Validation set.

Absolute Position Encoding: Sinusoid
Relative Position Encoding: iRPE (contextual product shared-head RPE)

enc_rpe2d	Backbone	#Buckets	epoch	AP	AP_50	AP_75	AP_S	AP_M	AP_L	Link	Log
rpe-1.9-product-ctx-1-k	ResNet-50	7 x 7	150	0.409	0.614	0.429	0.195	0.443	0.605	link	log, detail (188 MB)
rpe-2.0-product-ctx-1-k	ResNet-50	9 x 9	150	0.410	0.615	0.434	0.192	0.445	0.608	link	log, detail (188 MB)
rpe-2.0-product-ctx-1-k	ResNet-50	9 x 9	300	0.422	0.623	0.446	0.205	0.457	0.613	link	log, detail (375 MB)

--enc_rpe2d is an argument to represent the attributions of relative position encoding. [detail]

Citing iRPE

If this project is helpful for you, please cite it. Thank you! : )

@InProceedings{iRPE,
    title     = {Rethinking and Improving Relative Position Encoding for Vision Transformer},
    author    = {Wu, Kan and Peng, Houwen and Chen, Minghao and Fu, Jianlong and Chao, Hongyang},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {10033-10041}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iRPE

iRPE

README.md

Rethinking and Improving Relative Position Encoding for Vision Transformer

How to equip iRPE ?

Image Classification

Object Detection

Citing iRPE

Files

iRPE

Directory actions

More options

Directory actions

More options

Latest commit

History

iRPE

Folders and files

parent directory

README.md

Rethinking and Improving Relative Position Encoding for Vision Transformer

How to equip iRPE ?

Image Classification

Object Detection

Citing iRPE