Skip to content
/ MKT Public

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

License

Notifications You must be signed in to change notification settings

sunanhe/MKT

Repository files navigation

PWC PWC

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer (AAAI 2023 Oral)

Framework

This is the official repository of our paper Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer.

Setup

pip install -r requirements.txt

Preparation

  1. Download pretrained VLP(ViT-B/16) model from OpenAI CLIP.

  2. Download images of NUS-WIDE dataset from NUS-WIDE.

  3. Download annotations following the BiAM from here.

  4. Download other files from here.

The organization of the dataset directory is shown as follows.

NUS-WIDE
  ├── features
  ├── Flickr
  ├── Concepts81.txt
  ├── Concepts925.txt
  ├── img_names.pkl
  ├── label_emb.pt
  └── test_img_names.pkl

Training MKT on NUS-WIDE

python3 train_nus_first_stage.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model

The checkpoint of the first training stage is here.

python3 -m torch.distributed.launch --nproc_per_node=8 train_nus_second_stage.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --ckpt-path path_to_first_stage_ckpt

The checkpoint of the second training stage is here.

Testing MKT on NUS-WIDE

python3 train_nus_second_stage.py --eval \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --ckpt-path path_to_first_stage_ckpt \
        --eval-ckpt path_to_first_second_ckpt \

Inference on A Single Image

python3 inference.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --img-ckpt path_to_first_stage_ckpt \
        --txt-ckpt path_to_second_stage_ckpt \
        --image-path figures/test.jpg

Acknowledgement

We would like to thank BiAM and timm for the codebase.

License

MKT is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Consider cite MKT in your publications if it helps your research.

@article{he2022open,
  title={Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer},
  author={He, Sunan and Guo, Taian and Dai, Tao and Qiao, Ruizhi and Ren, Bo and Xia, Shu-Tao},
  journal={arXiv preprint arXiv:2207.01887},
  year={2022}
}

About

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Topics

Resources

License

Stars

Watchers

Forks

Languages