PAN: Persistent Appearance Network

PyTorch Implementation of paper:

PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

Can Zhang, Yuexian Zou*, Guang Chen and Lei Gan.

[ArXiv]

Updates

[12 Aug 2020] We have released the codebase and models of the PAN.

Main Contribution

Efficiently modeling dynamic motion information in videos is crucial for action recognition task. Most state-of-the-art methods heavily rely on dense optical flow as motion representation. Although combining optical flow with RGB frames as input can achieve excellent recognition performance, the optical flow extraction is very time-consuming. This undoubtably will count against real-time action recognition. In this paper, we shed light on fast action recognition by lifting the reliance on optical flow. We design a novel motion cue called Persistence of Appearance (PA) that focuses more on distilling the motion information at boundaries. Extensive experiments show that our PA is over 1000x faster (8196fps vs. 8fps) than conventional optical flow in terms of motion modeling speed.

Content

Dependencies
Data Preparation
Core Codes
- PA Module
- VAP Module
Pretrained Models
- Something-Something-V1
- Something-Something-V2
Testing
Training
Other Info

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Something-Something-V1 and V2, Jester datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

Extract frames from videos:
- For Something-Something-V2 dataset, please use tools/vid2img_sthv2.py
- For Kinetics dataset, please use tools/vid2img_kinetics.py
Generate file lists needed for dataloader:
- Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:
```
video_frame_folder 100 10
video_2_frame_folder 150 31
...
```
- Or you can use off-the-shelf tools provided by other repos:
  - For Something-Something-V1 & V2 datasets, please use tools/gen_label_sthv1.py & tools/gen_label_sthv2.py
  - For Kinetics dataset, please use tools/gen_label_kinetics.py
Add the information to ops/dataset_configs.py

Core Codes

PA Module

PA module aims to speed up the motion modeling procedure, it can be simply injected at the bottom of the network to lift the reliance on optical flow.

from ops.PAN_modules import PA

PA_module = PA(n_length=4) # adjacent '4' frames are sampled for computing PA
# shape of x: [N*T*m, 3, H, W]
x = torch.randn(5*8*4, 3, 224, 224)
# shape of PA_out: [N*T, m-1, H, W]
PA_out = PA_module(x) # torch.Size([40, 3, 224, 224])

VAP Module

VAP module aims to adaptively emphasize expressive features and suppress less informative ones by observing global information across various timescales. It is adopted at the top of the network to achieve long-term temporal modeling.

from ops.PAN_modules import VAP

VAP_module = VAP(n_segment=8, feature_dim=2048, num_class=174, dropout_ratio=0.5)
# shape of x: [N*T, D]
x = torch.randn(5*8, 2048)
# shape of VAP_out: [N, num_class]
VAP_out = VAP_module(x) # torch.Size([5, 174])

Pretrained Models

Here, we provide the pretrained models of PAN models on Something-Something-V1 & V2 datasets. Recognizing actions in these datasets requires strong temporal modeling ability, as many action classes are symmetrical. PAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model	Backbone	FLOPs * views	Val Top1	Val Top5	Checkpoints
PAN_Lite	ResNet-50	35.7G * 1	48.0	76.1	[Google Drive] or [Weiyun]
PAN_Full		67.7G * 1	50.5	79.2
PAN_En		(46.6G+88.4G) * 2	53.4	81.1
PAN_En	ResNet-101	(85.6G+166.1G) * 2	55.3	82.8	[Google Drive] or [Weiyun]

Something-Something-V2

Model	Backbone	FLOPs * views	Val Top1	Val Top5	Checkpoints
PAN_Lite	ResNet-50	35.7G * 1	60.8	86.7	[Google Drive] or [Weiyun]
PAN_Full		67.7G * 1	63.8	88.6
PAN_En		(46.6G+88.4G) * 2	66.2	90.1
PAN_En	ResNet-101	(85.6G+166.1G) * 2	66.5	90.6	[Google Drive] or [Weiyun]

Testing

For example, to test the PAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test PAN_Lite
bash scripts/test/sthv1/Lite.sh

# test PAN_Full
bash scripts/test/sthv1/Full.sh

# test PAN_En
bash scripts/test/sthv1/En.sh

Training

We provided several scripts to train PAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train PAN_Lite
bash scripts/train/sthv1/Lite.sh

# train PAN_Full RGB branch
bash scripts/train/sthv1/Full_RGB.sh

# train PAN_Full PA branch
bash scripts/train/sthv1/Full_PA.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 256 you should set learning rate to 0.04.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

TSM
TSN

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our PAN useful to your research:

@misc{zhang2020pan,
    title={PAN: Towards Fast Action Recognition via Learning Persistence of Appearance},
    author={Can Zhang and Yuexian Zou and Guang Chen and Lei Gan},
    year={2020},
    eprint={2008.03462},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Or if you prefer "publication", you can cite our preliminary work on ACM MM 2019:

@inproceedings{zhang2019pan,
  title={PAN: Persistent Appearance Network with an Efficient Motion Cue for Fast Action Recognition},
  author={Zhang, Can and Zou, Yuexian and Chen, Guang and Gan, Lei},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={500--509},
  year={2019}
}

Contact

For any questions, please feel free to open an issue or contact:

Can Zhang: zhang.can.pku@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
archs		archs
ops		ops
pretrained		pretrained
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
main.py		main.py
opts.py		opts.py
test_models.py		test_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAN: Persistent Appearance Network

Updates

Main Contribution

Content

Dependencies

Data Preparation

Core Codes

PA Module

VAP Module

Pretrained Models

Something-Something-V1

Something-Something-V2

Testing

Training

Other Info

References

Citation

Contact

About

Languages

License

zhang-can/PAN-PyTorch

Folders and files

Latest commit

History

Repository files navigation

PAN: Persistent Appearance Network

Updates

Main Contribution

Content

Dependencies

Data Preparation

Core Codes

PA Module

VAP Module

Pretrained Models

Something-Something-V1

Something-Something-V2

Testing

Training

Other Info

References

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages