This is the official repository of our paper Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer.
pip install -r requirements.txt
-
Download pretrained VLP(ViT-B/16) model from OpenAI CLIP.
-
Download images of NUS-WIDE dataset from NUS-WIDE.
-
Download other files from here.
The organization of the dataset directory is shown as follows.
NUS-WIDE
├── features
├── Flickr
├── Concepts81.txt
├── Concepts925.txt
├── img_names.pkl
├── label_emb.pt
└── test_img_names.pkl
python3 train_nus_first_stage.py \
--data-path path_to_dataset \
--clip-path path_to_clip_model
The checkpoint of the first training stage is here.
python3 -m torch.distributed.launch --nproc_per_node=8 train_nus_second_stage.py \
--data-path path_to_dataset \
--clip-path path_to_clip_model \
--ckpt-path path_to_first_stage_ckpt
The checkpoint of the second training stage is here.
python3 train_nus_second_stage.py --eval \
--data-path path_to_dataset \
--clip-path path_to_clip_model \
--ckpt-path path_to_first_stage_ckpt \
--eval-ckpt path_to_first_second_ckpt \
python3 inference.py \
--data-path path_to_dataset \
--clip-path path_to_clip_model \
--img-ckpt path_to_first_stage_ckpt \
--txt-ckpt path_to_second_stage_ckpt \
--image-path figures/test.jpg
We would like to thank BiAM and timm for the codebase.
MKT is MIT-licensed. The license applies to the pre-trained models as well.
Consider cite MKT in your publications if it helps your research.
@article{he2022open,
title={Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer},
author={He, Sunan and Guo, Taian and Dai, Tao and Qiao, Ruizhi and Ren, Bo and Xia, Shu-Tao},
journal={arXiv preprint arXiv:2207.01887},
year={2022}
}