Official PyTorch implementation for the paper:
MoStGAN-V: Video Generation with Temporal Motion Styles, CVPR 2023.
Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
conda env create -f environment.yaml
And also make sure StyleGAN2-ADA is runnable.
4 32GB V100s are required, training time is approximately 2 days
We follow the same procedure as StyleGAN-V to process all datasets
convert_videos_to_frames.py -s /path/to/source -t /path/to/target --video_ext mp4 --target_size 256
FaceForensics was preprocessed with src/scripts/preprocess_ffs.py
to extract face crops, (result in a little bit unstable).
python src/infra/launch.py hydra.run.dir=. exp_suffix=my_experiment_name env=local dataset=ffs dataset.resolution=256 num_gpus=4
- evaluation
src/scripts/calc_metrics.py
- generation
python src/scripts/generate.py --network_pkl /path/to/network-snapshot.pkl --num_videos 25 --as_grids true --save_as_mp4 true --fps 25 --video_len 128 --batch_size 25 --outdir /path/to/output/dir --truncation_psi 0.9
You can find the checkpoints from here
This code is mainly built upon StyleGAN2-ADA and StyleGAN-V repositories.
Baseline codes are from MoCoGAN-HD, VideoGPT, DIGAN, StyleGAN-V
@article{shen2023mostganv,
author = {Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny},
title = {MoStGAN-V: Video Generation with Temporal Motion Styles},
journal = {arXiv preprint arXiv:2304.02777},
year = {2023},
}