Jinheng Xie1 Jiajun Feng1* Zhaoxu Tian1* Kevin Qinghong Lin1 Yawen Huang2 Xi Xia1 Nanxu Gong1 Xu Zuo1 Jiaqi Yang1 Yefeng Zheng2 Mike Zheng Shou1
1 National University of Singapore 2 Jarvis Research Center, Tencent Youtu Lab
- [2024-04-24] The dataset is released.
The annotation can be downloaded from here and is structured as follows:
# each storyboard in train.json and test.json has the following elements
{
'flag': 'train' ('val', 'testA', or 'testB'),
'global_id': ,
'movie_id': ,
'key_frames': ,
'resolution': ,
'title': ,
'genre': ,
'emotion': ,
'scene': ,
'summary': ,
'cast': ,
'main characters': ,
'#characters': ,
'synopses': ,
# a list of N (#frames of the current storyboard) sub-lists and each sub-list contains M bounding boxes formatted in [x1, y1, x2, y2],
# in which each coordinate is scaled into [0,1] by dividing the long side of the frame (max(H,W)).
'bboxes_person': ,
'bboxes_object': ,
'keypoints': ,
}
Visualize storyboards
python data_preprocess.py --input-path path/to/json/file --num-instructions 1 --vis-save-dir outputs/debug --instruct --save-flag instruct --vis-storyboard --max-frames 11 --noise
or
python data_preprocess.py --input-path path/to/json/file --num-instructions 1 --vis-save-dir outputs/debug --instruct --save-flag instruct --vis-storyboard --max-frames 11 --noise --data-root path/to/storyboard20k/frames/[train, test]
The visualized storyboards will be stored at outputs/debug
. --noise
means no random noises are added to each sample for augmentation. You can specify the root to source frames by --data-root
to visualize storyboards with source frames. By specifying --vis-stroyboard
, no text sequences will be saved. If you want to save the processed text sequences for training and test, you should run python data_preprocess.py --input-path path/to/json/file --num-instructions 1 --instruct --save-flag instruct
without specifying --vis-storyboard
.
Source Movie Frames
Please make sure to request access to the MPII Movie Description dataset (MPII-MD) first and cc the approval email to sierkinhane@gmail.com. Once you have received approval, I will provide you with the link to download the pre-processed movie frames of Storyboard20K.
├── storyboard20k/
| ├── frames
| | ├—— train
| | | ├—— tt0167260
| | | ├—— ...
| └── ——— test
If you find our work inspiring or use our dataset or codebase in your research, please consider giving a star ⭐ and a citation.
@article{xie2024learning,
title={Learning Long-form Video Prior via Generative Pre-Training},
author={Xie, Jinheng and Feng, Jiajun and Tian, Zhaoxu and Lin, Kevin Qinghong and Huang, Yawen and Xia, Xi and Gong, Nanxu and Zuo, Xu and Yang, Jiaqi and Zheng, Yefeng and others},
journal={arXiv preprint arXiv:2404.15909},
year={2024}
}