Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ttccxx committed Jul 7, 2023
2 parents 42fd902 + f38376c commit dc397d4
Show file tree
Hide file tree
Showing 43 changed files with 18,155 additions and 1 deletion.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
exp/
**/__pycache__/
*.pth
batchscript-*
phoenix-slurm-*
.ipynb_checkpoints/
.idea/
.vscode/
400 changes: 400 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

125 changes: 124 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,124 @@
# Siamese-Image-Modeling
# Siamese-Image-Modeling

By [Chenxin Tao](https://scholar.google.com/citations?user=sXHFIBkAAAAJ&hl=zh-CN),
[Xizhou Zhu](https://scholar.google.com/citations?user=02RXI00AAAAJ),
[Weijie Su](https://www.weijiesu.com/),
[Gao Huang](http://www.gaohuang.net/),
[Bin Li](http://staff.ustc.edu.cn/~binli/),
[Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en),
[Yu Qiao](https://scholar.google.com.hk/citations?user=gFtI-8QAAAAJ&hl=en),
[Xiaogang Wang](http://www.ee.cuhk.edu.hk/~xgwang/),
[Jifeng Dai](https://jifengdai.org/)

This is the official implementation of the CVPR 2023 paper [Siamese Image Modeling for Self-Supervised Vision Representation Learning](https://arxiv.org/pdf/2206.01204.pdf).

![SiameseIM-overview](./figs/overview.png)

## 🏠 Introduction

SiameseIM is a new form of self-supervised learning that can learn semantic alignment and spatial sensitivity with a single dense loss. We note the following key observations from SiameseIM:

- Compared with MIM methods, SiameseIM shows that reconstructing another view helps to obtain good semantic alignment.

- Compared with ID methods, SiameseIM shows that dense supervision can be applied by matching the dense correspondence between two views strictly through their relative positions.

- SiameseIM is able to surpass both MIM and ID methods over a wide range of tasks. SiameseIM obtains more improvements in few-shot, long-tail and robustness-concerned scenarios.


![SiameseIM-comparison](./figs/comparison.png)


## 📈 Main Results

<table border="1" width="100%">
<tr align="center">
<th></th>
<th colspan="3">ImageNet</th>
<th colspan="2">COCO</th>
<th>ADE20k</th>
<th colspan="4">LVIS</th>
<th colspan="4">Robustness</th>
</tr>
<tr align="center">
<td></td><td>FT</td><td>LIN</td><td>1% FT</td><td>AP box</td><td>AP mask</td><td>mIoU</td><td>AP box</td><td>AP box rare</td><td>AP mask</td><td>AP mask rare</td><td>IN-A top-1</td><td>IN-R top-1</td><td>IN-Sketch top-1</td><td>IN-C 1-mCE</td>
</tr>
<tr align="center">
<td>MoCo-v3 (ID method)</td><td>83.0</td><td>76.7</td><td>63.4</td><td>47.9</td><td>42.7</td><td>47.3</td><td>37.3</td><td>25.5</td><td>35.3</td><td>25.8</td><td>32.4</td><td>49.8</td><td>35.9</td><td>55.4</td>
</tr>
<tr align="center">
<td>MAE (MIM method)</td><td>83.6</td><td>68.0</td><td>51.1</td><td>51.6</td><td>45.9</td><td>48.1</td><td>40.1</td><td>29.3</td><td>38.1</td><td>29.1</td><td>35.9</td><td>48.3</td><td>34.5</td><td>48.3</td>
</tr>
<tr align="center">
<td><b>SiameseIM</b></td><td><b>84.1</b></td><td><b>78.0</b></td><td><b>65.1</b></td><td><b>52.1</b></td><td><b>46.2</b></td><td><b>51.1</b></td><td><b>40.5</b></td><td><b>30.9</b></td><td><b>38.1</b></td><td><b>30.1</b></td><td><b>43.8</b></td><td><b>52.5</b></td><td><b>38.3</b></td><td><b>57.1</b></td>
</tr>
<tr align="center">
<td>Improve w.r.t. MoCo-v3</td><td>+1.1</td><td>+1.3</td><td>+1.7</td><td>+4.2</td><td>+3.5</td><td>+3.8</td><td>+3.2</td><td>+5.4</td><td>+2.8</td><td>+4.3</td><td>+11.4</td><td>+2.7</td><td>+2.4</td><td>+1.7</td>
</tr>
<tr align="center">
<td>Improve w.r.t. MAE</td><td>+0.5</td><td>+10.0</td><td>+14.0</td><td>+0.5</td><td>+0.3</td><td>+3.0</td><td>+0.4</td><td>+1.6</td><td>+0.0</td><td>+1.0</td><td>+7.9</td><td>+4.2</td><td>+3.8</td><td>+8.8</td>
</tr>
</table>


Note:

(1) Compared with MoCo-v3, SiameseIM improves dense prediction tasks (COCO detection, ADE20k segmentation, LVIS detection) significantly;

(2) Compared with MAE, SiameseIM improves long-tail, few-shot, robustness tasks (ImageNet linear evaluation & few-shot classification, ADE20k segmentation, LVIS detection) significantly;

(3) Notably, ADE20k segmentation and LVIS detection both contain long-tail classes, which put forward high requirement for semantic alignment, and detection tasks, which demand good spatial alignment. Thus, SiameseIM can surpass both MoCo-v3 and MAE by a large margin on these tasks.


## 🛠️ Usage
### Preparation

See [prepare.md](docs/prepare.md)

### Model Checkpoint

See [checkpoints.md](docs/checkpoints.md)

### Pretrain

See [pretrain.md](docs/pretrain.md)

### Finetune

See [finetune.md](docs/finetune.md)

### Linear Evaluation

See [linear_eval.md](docs/linear_eval.md)

### Few-shot Evaluation

See [few_shot.md](docs/few_shot.md)

### COCO & LVIS Detection

We use ViTDet for detection tasks, please refer to [detectron2](https://github.com/facebookresearch/detectron2/tree/main/projects/ViTDet).

### ADE20k Segmentation

We follow MAE to use UPerNet for segmentation task, please refer to [mmsegmentation](https://github.com/open-mmlab/mmsegmentation/tree/main/configs/mae).

### Robustness Evaluation

We evaluate the ImageNet finetuned model on [ImageNet-A](https://github.com/hendrycks/natural-adv-examples), [ImageNet-R](https://github.com/hendrycks/imagenet-r), [ImageNet-Sketch](https://github.com/HaohanWang/ImageNet-Sketch) and [ImageNet-C](https://github.com/hendrycks/robustness) datasets.


## 📃 License

This project is released under the [CC-BY-NC 4.0 license](./LICENSE).

## 🖊️ Citing SiameseIM
If you find SiameseIM useful in your research, please consider citing:
```bibtex
@inproceedings{tao2023siamese,
title={Siamese image modeling for self-supervised vision representation learning},
author={Tao, Chenxin and Zhu, Xizhou and Su, Weijie and Huang, Gao and Li, Bin and Zhou, Jie and Qiao, Yu and Wang, Xiaogang and Dai, Jifeng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2132--2141},
year={2023}
}
```
26 changes: 26 additions & 0 deletions configs/few-shot/dist_fewshot_sim_base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
set -x

IP=${1}
RANK=${2}
NNODES=${3}
CKPT_PATH=${4}
DATA_PATH=${5}
PORT=${PORT:-28500}
PY_ARGS=${PY_ARGS:-""}

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/fewshot/${EXP_NAME}

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${NNODES} --node_rank=${RANK} --master_addr=${IP} --master_port=${PORT} \
main_logistic.py \
--subset-path imagenet_subset1/1percent.txt \
--root-path ${DATA_PATH} \
--image-folder imagenet_full_size/061417/ \
--device cuda:0 \
--pretrained ${CKPT_PATH} \
--fname 'fewshot_1percent.pth' \
--model-name 'vit_base_patch16' \
--penalty l2 \
--lambd 0.1 \
--preload
34 changes: 34 additions & 0 deletions configs/few-shot/slurm_fewshot_sim_base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
set -x

GPUS=${1}
GPUS_PER_NODE=${2}
QUOTATYPE=${3}
PARTITION=${4}
CKPT_PATH=${5}
DATA_PATH=${6}
CPUS_PER_TASK=${CPUS_PER_TASK:-12}

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/fewshot/${EXP_NAME}
JOB_NAME=fewshot-${EXP}

srun --partition=${PARTITION} \
--mpi=pmi2 \
--quotatype=${QUOTATYPE} \
--job-name=${JOB_NAME} \
-n$GPUS \
--gres=gpu:${GPUS_PER_NODE} \
--ntasks-per-node=${GPUS_PER_NODE} \
--cpus-per-task=$CPUS_PER_TASK \
--kill-on-bad-exit=1 \
python -W ignore -u main_logistic.py \
--subset-path imagenet_subset1/1percent.txt \
--root-path ${DATA_PATH} \
--image-folder imagenet_full_size/061417/ \
--device cuda:0 \
--pretrained ${CKPT_PATH} \
--fname 'fewshot_1percent.pth' \
--model-name 'vit_base_patch16' \
--penalty l2 \
--lambd 0.1
31 changes: 31 additions & 0 deletions configs/finetune/dist_finetune_sim_base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
set -x

IP=${1}
RANK=${2}
NNODES=${3}
CKPT_PATH=${4}
DATA_PATH=${5}
PORT=${PORT:-28500}
PY_ARGS=${PY_ARGS:-""}

TOTAL_BATCH_SIZE=1024
let BATCH_SIZE=${TOTAL_BATCH_SIZE}/${NNODES}/8

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/finetune/${EXP_NAME}

mkdir -p ${DIR}

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${NNODES} --node_rank=${RANK} --master_addr=${IP} --master_port=${PORT} \
main_finetune.py \
--output_dir ${DIR} \
--log_dir ${DIR} \
--batch_size ${BATCH_SIZE} \
--model vit_base_patch16 \
--finetune ${CKPT_PATH} \
--epochs 100 \
--blr 2.5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${DATA_PATH} \
${PY_ARGS} 2>&1 | tee -a ${DIR}/stdout.txt
33 changes: 33 additions & 0 deletions configs/finetune/dist_finetune_sim_base_eval.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
set -x

IP=${1}
RANK=${2}
NNODES=${3}
CKPT_PATH=${4}
DATA_PATH=${5}
PORT=${PORT:-28500}
PY_ARGS=${PY_ARGS:-""}

TOTAL_BATCH_SIZE=1024
let BATCH_SIZE=${TOTAL_BATCH_SIZE}/${NNODES}/8

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/finetune/${EXP_NAME}

mkdir -p ${DIR}

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${NNODES} --node_rank=${RANK} --master_addr=${IP} --master_port=${PORT} \
main_finetune.py \
--output_dir ${DIR} \
--log_dir ${DIR} \
--batch_size ${BATCH_SIZE} \
--model vit_base_patch16 \
--resume ${CKPT_PATH} \
--epochs 100 \
--blr 2.5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${DATA_PATH} \
--eval \
--use_tcs_dataset \
${PY_ARGS} 2>&1 | tee -a ${DIR}/stdout.txt
45 changes: 45 additions & 0 deletions configs/finetune/slurm_finetune_sim_base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
set -x

GPUS=${1}
GPUS_PER_NODE=${2}
QUOTATYPE=${3}
PARTITION=${4}
CPUS_PER_TASK=${CPUS_PER_TASK:-12}
CKPT_PATH=${5}
DATA_PATH=${6}
SRUN_ARGS=${SRUN_ARGS:-""}
PY_ARGS=${PY_ARGS:-""}


TOTAL_BATCH_SIZE=1024
let BATCH_SIZE=${TOTAL_BATCH_SIZE}/${GPUS}

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/finetune/${EXP_NAME}
JOB_NAME=ft-${EXP}

mkdir -p ${DIR}

srun --partition=${PARTITION} \
--mpi=pmi2 \
--quotatype=${QUOTATYPE} \
--job-name=${JOB_NAME} \
-n$GPUS \
--gres=gpu:${GPUS_PER_NODE} \
--ntasks-per-node=${GPUS_PER_NODE} \
--cpus-per-task=$CPUS_PER_TASK \
--kill-on-bad-exit=1 \
${SRUN_ARGS} \
python -u main_finetune.py \
--output_dir ${DIR} \
--log_dir ${DIR} \
--batch_size ${BATCH_SIZE} \
--model vit_base_patch16 \
--finetune ${CKPT_PATH} \
--epochs 100 \
--blr 2.5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${DATA_PATH} \
--use_tcs_dataset \
${PY_ARGS} 2>&1 | tee -a ${DIR}/stdout.txt
46 changes: 46 additions & 0 deletions configs/finetune/slurm_finetune_sim_base_eval.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
set -x

GPUS=${1}
GPUS_PER_NODE=${2}
QUOTATYPE=${3}
PARTITION=${4}
CPUS_PER_TASK=${CPUS_PER_TASK:-12}
CKPT_PATH=${5}
DATA_PATH=${6}
SRUN_ARGS=${SRUN_ARGS:-""}
PY_ARGS=${PY_ARGS:-""}


TOTAL_BATCH_SIZE=1024
let BATCH_SIZE=${TOTAL_BATCH_SIZE}/${GPUS}

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/finetune/${EXP_NAME}
JOB_NAME=ft-${EXP}

mkdir -p ${DIR}

srun --partition=${PARTITION} \
--mpi=pmi2 \
--quotatype=${QUOTATYPE} \
--job-name=${JOB_NAME} \
-n$GPUS \
--gres=gpu:${GPUS_PER_NODE} \
--ntasks-per-node=${GPUS_PER_NODE} \
--cpus-per-task=$CPUS_PER_TASK \
--kill-on-bad-exit=1 \
${SRUN_ARGS} \
python -u main_finetune.py \
--output_dir ${DIR} \
--log_dir ${DIR} \
--batch_size ${BATCH_SIZE} \
--model vit_base_patch16 \
--resume ${CKPT_PATH} \
--epochs 100 \
--blr 2.5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${DATA_PATH} \
--eval \
--use_tcs_dataset \
${PY_ARGS} 2>&1 | tee -a ${DIR}/stdout.txt
34 changes: 34 additions & 0 deletions configs/linprobe/dist_linprobe_sim_base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
set -x

IP=${1}
RANK=${2}
NNODES=${3}
CKPT_PATH=${4}
DATA_PATH=${5}
PORT=${PORT:-28500}
PY_ARGS=${PY_ARGS:-""}

TOTAL_BATCH_SIZE=16384
let BATCH_SIZE=${TOTAL_BATCH_SIZE}/${NNODES}/8

BASENAME=$(basename ${CKPT_PATH})
EXP_NAME=$(basename $(dirname ${CKPT_PATH}))
DIR=./exp/linear/${EXP_NAME}

mkdir -p ${DIR}

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${NNODES} --node_rank=${RANK} --master_addr=${IP} --master_port=${PORT} \
main_linprobe.py \
--batch_size ${BATCH_SIZE} \
--model vit_base_patch16 \
--finetune ${CKPT_PATH} \
--epochs 90 \
--blr 0.1 \
--weight_decay 0.0 \
--dist_eval \
--output_dir ${DIR} \
--log_dir ${DIR} \
--global_pool \
--data_path ${DATA_PATH} \
--use_tcs_dataset \
${PY_ARGS} 2>&1 | tee -a ${DIR}/stdout.txt
Loading

0 comments on commit dc397d4

Please sign in to comment.