Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos
Jiarui Wang1, Ru Huang2, Xiongkuo Min1, Guangtao Zhai1*, Wenjun Zhang1
中文版速递:知乎
Motivation: 1. Action quality owns a significant impact on the quality of AI-generated videos. 2. Current action quality assessment (AQA) studies predominantly predominantly focus on domain-specific actions from real videos and collect coarse-grained expert-only human ratings on limited dimensions.
- [2024/9/26] 🔥🔥🔥 GAIA is accepted by NeurIPS2024 D&B track as a Spotlight paper. We will soon update the arxiv.
- [2024/6/18] 🔥 The proposed GAIA dataset is online!! Download it by OneDrive or Baidu Netdisk using Code: ks51
- [2024/6/17] 🔥 We upload the used action prompt in
prompts_all.csv
as well as its corresponding category (action_label.xlsx
) -
[2024/6/11] We are preparing the GAIA data and meta information. - [2024/6/6] Github repo for GAIA is online.
Download the GAIA (9,180 videos) from the released link (OneDrive or Baidu Netdisk using Code: ks51)
Video naming rules: (model name)_(action keyword).mp4
(action keyword)
also serve as the index to search the corresponding action prompt in prompts_all.csv
GAIA
|
|--videos
| |-- Anmidiff_Abseiling.mp4
| |-- Anmidiff_Admiration.mp4
| |-- ...
| |-- zeroScope_Zumba.mp4
|
|-- MOS.csv
| filename | final action subject | final action completeness | final action interaction |
| Anmidiff_Abseiling.mp4 | 49.0098 | 46.9289 | 52.1406 |
| ... | ... | ... | ... |
In this work, we opt to collect annotations from a novel causal reasoning syllogism-based perspective. We decompose an action process into three parts: 1) action subject as major premise, 2) action completeness as minor premise, and 3) action-scene interaction as conclusion. The rationales for this strategy are as follows: (a) As the visual saliency information in action-oriented videos, the rendering quality of the action subject can profoundly affect the visibility of the action, while humans excel at perceiving such generated artifacts. (b) Moreover, unlike parallel-form feedbacks, the order of these three parts in action syllogism inherently aligns with the human reasoning process.
As a result, a total of 971,244 ratings among 9,180 video-action pairs were collected.
We evaluate 18 popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions.
For open-source lab studies, VideoCrafter2 takes the first place. For large-scale commercial applications, Morph Studio and Stable Video take the first and second place.
Existing T2V models struggle to render actions with drastic motion changes, where atypical body postures are more easily involved. Additionally, when it comes to the local hand action categories, the actions contain subtle movements receive significantly lower MOSs than others, showing the inferior capacity of generating fine-grained actions.
All-Combined indicates that we sum the MOS of three dimensions and rescale it to [$0,100$] as the overall action quality score.
Model | Code/Project Link |
---|---|
Text2Video-Zero | https://github.com/Picsart-AI-Research/Text2Video-Zero |
ModelScope | https://modelscope.cn/models/iic/text-to-video-synthesis/summary |
ZeroScope | https://huggingface.co/cerspense/zeroscope_v2_576w |
LaVie | https://github.com/Vchitect/LaVie |
Show-1 | https://github.com/showlab/Show-1 |
Hotshot-XL | https://github.com/hotshotco/Hotshot-XL |
AnimateDiff | https://github.com/guoyww/AnimateDiff |
VideoCrafter1-512 / VideoCrafter1-1024 / VideoCrafter2 | https://github.com/AILab-CVC/VideoCrafter |
Mora | https://github.com/lichao-sun/Mora |
Gen-2 | https://research.runwayml.com/gen2 |
Genmo | https://www.genmo.ai |
Pika | https://pika.art/home |
NeverEnds | https://neverends.life |
MoonValley | https://moonvalley.ai |
Morph Studio | https://www.morphstudio.com |
Stable Video | https://www.stablevideo.com/welcome |
Please contact the first author of this paper for queries.
- Zijian Chen,
zijian.chen@sjtu.edu.cn
If you find our work interesting, please feel free to cite our paper:
@article{chen2024gaia,
title={GAIA: Rethinking Action Quality Assessment for AI-Generated Videos},
author={Chen, Zijian and Sun, Wei and Tian, Yuan and Jia, Jun and Zhang, Zicheng and Wang, Jiarui and Huang, Ru and Min, Xiongkuo and Zhai, Guangtao and Zhang, Wenjun},
journal={arXiv preprint arXiv:2406.06087},
year={2024}
}