Codebase for CVPR2020 A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
From a video to segmented scenes. Basically, two steps are needed, including holistic features extraction and temporal scene segmentation.
A single-stage temporal scene segmentation is also provided in the demo. This is going to be an easy-to-use tool for plot/story understanding with scene as a semantic unit. Currently, it only supports image input.
😬 The scene segmentation dataset is prompted to MovieNet project with 318 movies together with a easy-to-use toolkit. It is encouraged to use in the future.
- Basic video processing tools are provided including shot detection and its parallel version.
- Holistic semantic video feature extractors including place, audio, human, action, speech are planned to be included if you wish and leave a looking forward message in the issue. Place and audio are supported now in the
pre
. Full version is located at movienet-tools. - All-in-one scene segmentation tool with all multi-modal multi-semantic elements.
😅 Since some enthusiastic researchers are requesting the codes but we plan to organize the codebase in an easy-to-use fashion, e.g. movienet-tools, we release an on-going version here.
Please refer to INSTALL.md for installation and dataset preparation. Pretrained models and dataset are also explanined here.
🥳 Please see GETTING_STARTED.md for the basic usage.
@inproceedings{rao2020local,
title={A Local-to-Global Approach to Multi-modal Movie Scene Segmentation},
author={Rao, Anyi and Xu, Linning and Xiong, Yu and Xu, Guodong and Huang, Qingqiu and Zhou, Bolei and Lin, Dahua},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}