- clone and build caffe from here. This caffe version is based on Limin Wang's fork [1] contains
merge_batch
andweighted_sum
layer. In addition it exposed some protected caffe functions in the matlab interface to emulateiter_size
in matlab. - modify caffe_mex.m to the corresponding caffe matlab interface directory
- extract optical flow with Limin's flow extractor
- We extracted 118 objects' bounding boxes in all video frames using Faster-RCNN [2] (retraining is required) and obtained filtered bounding boxes taking consideration of temporal coherency and motion saliency.
- The extracted and processed bounding boxes for ucf-101 can be downloaded here. Place the downloaded mat files under
imdb/cache
. - If you wish to extract the bounding boxes yourself, you need to be able to run Ren Shaoqing's Faster-RCNN (most codes are migrated into this repository with minor modifications and more comments)
- First generate raw object detection using
faster_rcnn_{dataset}.m
- Then use
action/prepare_rois_context.m
to process bounding boxes as described in the paper.
- First generate raw object detection using
create dataset.mat using imdb/get_{name}_dataset.m
(Directories may need to be adjusted!)
An example of generated ucf_dataset.mat
-
models/srcnn/{stream}
contains model prototxt files -
model weights can be downloaded in the following links
Stream person+scene (the final proposed model in the paper) spatial split1 split2 split3 flow split1 split2 split3 -
the reported two-stream results in the paper are yielded from summing spatial and temporal classification scores using weight 1 : 3.
-
other models mentioned in the paper experiments can be provided if the demand is large.
in matlab
% test spatial
test_spatial('model_path', path_to_weights, 'split', 1)
% test flow
`test_flow('model_path', path_to_weights, 'split', 1)`
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
Wang, L., Xiong, Y., Wang, Z., & Qiao, Y. (2015). Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159.
Please cite the following if you find the code useful.
@inproceedings{wang2016two,
title={Two-Stream SR-CNNs for Action Recognition in Videos},
author={Yifan, Wang and Song, Jie and Wang, Limin and Van Gool, Luc and Hilliges, Otmar},
year={2016},
organization={BMVC}
}
Yifan Wang: yifan.wang@inf.ethz.ch