PyTorch implementation of the paper "Learning actionness from action/background discrimination" for action localization on the CrossTask dataset. Tested with Python 3.8.13, PyTorch 1.11.0, Numpy 1.22.4, ffmpeg-python 0.2.0 .
For video and text feature extraction MIL-NCE is used.
- First, follow the link in MIL-NCE and download the word2vec matrix and dictionary. Arrange the --word2vec_path and --dict_path arguments in args.py accordingly.
- Then, download the pretrained S3D weights "s3d_howto100m.pth" from S3D and update the --net_weights_path.
- Follow the instructions given in CrossTask to download the videos. Update the --videos_path and --annotations_path.
- Arrange --video_features_path and --text_features_path.
- Run extract_features.py . In order to use different videos, you should update the featextract/data/video_list.csv .
This part is for replacating the action localization result given in MIL-NCE by using the previously extracted features.
Repo will be updated regularly with further implementation of the paper.