Fa-Ting Hong^, Jia-Chang Feng^ Dan Xu, Ying Shan, and Wei-Shi Zheng. ^Equation Contribution
We propose CrOss-modal cOnsensus NETwork (CO2-Net), which introduces two identical proposed cross-modal consensus modules (CCM) that design across-modal attention mechanism to filter out the task-irrelevantinformation redundancy using the global information from themain modality and the cross-modal local information from theauxiliary modality.
- Create the anaconda environment as what we used.
conda env create -f environment.yaml
- Python 3.6 and Pytorch 1.3.0 are used. Basic requirements are listed in the 'requirements.txt'.
pip install -r requirements.txt
-
Download the pre-trained checkpoints.
-
Create the default folder
./ckpt
and put the downloaded pre-trained models in./ckpt
. -
Run the test scripts:
python main.py --max-seqlen 500 --lr 0.00005 --k 7 --dataset-name Thumos14reduced --path-dataset path/to/Thumos14 --num-class 20 --use-model CO2 --max-iter 5000 --dataset SampleDataset --weight_decay 0.001 --model-name CO2_3552 --seed 3552 --AWM BWA_fusion_dropout_feat_v2
The features for Thumos14 and ActivityNet1.2 dataset can be downloaded here. The annotations are included with this package.
- Run the train scripts:
python main.py --max-seqlen 500 --lr 0.00005 --k 7 --dataset-name Thumos14reduced --num-class 20 --use-model CO2 --max-iter 20000 --dataset SampleDataset --weight_decay 0.001 --model-name CO2 --seed 3552 --AWM BWA_fusion_dropout_feat_v2
@InProceedings{hong2021cross,
author = {Hong, Fa-Ting and Feng, Jia-Chang and Xu, Dan and Shan, Ying and Zheng, Wei-Shi},
title = {Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization},
booktitle = {ACM International Conference on Multimedia (ACM MM)},
year = {2021}
}