Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization (CVPR 2023)
Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang (USTC)
- Python 3.8
- Pytorch 1.8.0
- CUDA 11.1
Required packages are listed in requirements.txt
. You can install by running:
conda create -n P-MIL python=3.8
conda activate P-MIL
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip3 install -r requirements.txt
-
Prepare THUMOS14 dataset.
- We recommend using features and annotations provided by W-TALC or CO2-Net.
- You can also get access of it from Google Drive.
-
Prepare proposals generated from pre-trained S-MIL model.
- We recommend using their official codes (such as CO2-Net) to generate proposals.
- You can just download the proposals used in our paper from Google Drive.
-
Place the features and annotations inside a
data/Thumos14reduced/
folder and proposals inside aproposals
folder. Make sure the data structure is as below.
├── data
└── Thumos14reduced
├── Thumos14reduced-I3D-JOINTFeatures.npy
└── Thumos14reduced-Annotations
├── Ambiguous_test.txt
├── classlist.npy
├── duration.npy
├── extracted_fps.npy
├── labels_all.npy
├── labels.npy
├── original_fps.npy
├── segments.npy
├── subset.npy
└── videoname.npy
├── proposals
├── detection_result_base_test.json
├── detection_result_base_train.json
CUDA_VISIBLE_DEVICES=0 python main.py --run_type train
The pre-trained model can be downloaded from Google Drive, which is then placed inside a checkpoints
folder.
CUDA_VISIBLE_DEVICES=0 python main.py --run_type test --pretrained_ckpt checkpoints/best_model.pkl
The experimental results on THUMOS14 are as below. Note that the performance of checkpoints we provided is slightly different from the orignal paper!
Method \ mAP@IoU (%) | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | AVG |
---|---|---|---|---|---|---|---|---|
P-MIL | 70.8 | 66.5 | 57.8 | 48.6 | 39.8 | 27.0 | 14.3 | 46.4 |
@InProceedings{Ren_2023_CVPR,
author = {Ren, Huan and Yang, Wenfei and Zhang, Tianzhu and Zhang, Yongdong},
title = {Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {2394-2404}
}
We referenced the repos below for the code.