This code is for the paper "Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation" in CVPR2021.
The architecture of our Domain Agent Network:
conda create -n FSVOS python=3.6
conda activate FSVOS
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
conda install opencv cython
pip install easydict imgaug
- Download the 2019 version of Youtube-VIS dataset.
- Put the dataset in the
./data
folder.
data
└─ Youtube-VOS
└─ train
├─ Annotations
├─ JPEGImages
└─ train.json
- Install cocoapi for Youtube-VIS.
- Download the ImageNet pretrained backbone and put it into the
pretrain_model
folder.
pretrain_model
└─ resnet50_v2.pth
- Update the
root_path
inconfig/DAN_config.py
.
python train_DAN.py --group 1 --batch_size 4
You can download our pretrained
model to test.
python test_DAN.py --test_best --group 1
Part of the code is based upon:
- PMMs: https://github.com/Yang-Bob/PMMs
- PFENet: https://github.com/Jia-Research-Lab/PFENet
- STM-Training: https://github.com/lyxok1/STM-Training