VadCLIP

This is the official Pytorch implementation of our paper: "VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection" in AAAI 2024.

Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

Highlight

We present a novel diagram, i.e., VadCLIP, which involves dual branch to detect video anomaly in visual classification and language-visual alignment manners, respectively. With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained WSVAD. To our knowledge, VadCLIP is the first work to efficiently transfer pre-trained language-visual knowledge to WSVAD.
We propose three non-vital components to address new challenges led by the new diagram. LGT-Adapter is used to capture temporal dependencies from different perspectives; Two prompt mechanisms are devised to effectively adapt the frozen pre-trained model to WSVAD task; MIL-Align realizes the optimization of alignment paradigm under weak supervision, so as to preserve the pre-trained knowledge as much as possible.
We show that strength and effectiveness of VadCLIP on two large-scale popular benchmarks, and VadCLIP achieves state-of-the-art performance, e.g., it gets unprecedented results of 84.51% AP and 88.02% on XD-Violence and UCF-Crime respectively, surpassing current classification based methods by a large margin.

Training

Setup

We extract CLIP features for UCF-Crime and XD-Violence datasets, and release these features and pretrained models as follows:

Benchmark	CLIP[Baidu]	CLIP	Model[Baidu]	Model
UCF-Crime	Code: i5s7	OneDrive	Code: kq5u	OneDrive
XD-Violence	Code: 3ebx	OneDrive	Code: apw6	OneDrive

The following files need to be adapted in order to run the code on your own machine:

Change the file paths to the download datasets above in list/xd_CLIP_rgb.csv and list/xd_CLIP_rgbtest.csv.
Feel free to change the hyperparameters in xd_option.py

Train and Test

After the setup, simply run the following command:

Traing and infer for XD-Violence dataset

python xd_train.py
python xd_test.py

Traing and infer for UCF-Crime dataset

python ucf_train.py
python ucf_test.py

References

We referenced the repos below for the code.

XDVioDet
DeepMIL

Citation

If you find this repo useful for your research, please consider citing our paper:

@article{wu2023vadclip,
  title={Vadclip: Adapting vision-language models for weakly supervised video anomaly detection},
  author={Wu, Peng and Zhou, Xuerong and Pang, Guansong and Zhou, Lingru and Yan, Qingsen and Wang, Peng and Zhang, Yanning},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
  year={2024}
}

@article{wu2023open,
  title={Open-Vocabulary Video Anomaly Detection},
  author={Wu, Peng and Zhou, Xuerong and Pang, Guansong and Sun, Yujia and Liu, Jing and Wang, Peng and Zhang, Yanning},
  journal={arXiv preprint arXiv:2311.07042},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VadCLIP

Highlight

Training

Setup

Train and Test

References

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

VadCLIP

Highlight

Training

Setup

Train and Test

References

Citation