GitHub - xiefan-guo/i4vgen: [arXiv 2024] I4VGen: Image as Stepping Stone for Text-to-Video Generation

I4VGen: Image as Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

I4VGen: Image as Stepping Stone for Text-to-Video Generation
Xiefan Guo, Jinlin Liu, Miaomiao Cui, Di Huang
https://xiefan-guo.github.io/i4vgen

Abstract: I4VGen is a training-free and plug-and-play video diffusion inference framework, which decomposes text-to-video generation into two stages: anchor image synthesis and anchor image-guided video synthesis. It employs a generation-selection strategy for the anchor image, synthesizing candidate images and selecting the most appropriate one based on a reward mechanism to ensure close alignment with the text prompt. Subsequently, a novel Noise-Invariant Video Score Distillation Sampling is developed to animate the image to a video, followed by a video regeneration process to refine that, thereby significantly improving the video quality.

Requirements

Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
All experiments are conducted on a single NVIDIA V100 GPU (32 GB).

AnimateDiff

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your AnimateDiff Python environment:

# Create conda environment
conda env create -f environments/animatediff_environment.yaml
# Activate conda environment
conda activate animatediff_env

Inference setup: Please refer to the official repo of AnimateDiff. The setup guide is listed here. mm-sd-v15-v2 and stable-diffusion-v1-5 are used in our experiments.

Name	HuggingFace	Type
mm-sd-v15-v2	Link	Motion module
stable-diffusion-v1-5	Link	Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python -m scripts.animate_animatediff --config configs/animatediff_configs/i4vgen_animatediff.yaml

In configs/animatediff_configs/i4vgen_animatediff.yaml and ArgumentParser, arguments for inference:

motion_module: path to motion module, i.e., mm-sd-v15-v2 motion module
pretrained_model_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-5

LaVie

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your LaVie Python environment:

# Create LaVie conda environment
conda env create -f environments/lavie_environment.yaml
# Activate LaVie conda environment
conda activate lavie_env

Inference setup: Please refer to the official repo of LaVie. The base-version is employed in our experiments. Download pre-trained lavie_base and stable-diffusion-v1-4.

Name	HuggingFace	Type
lavie_base	Link	LaVie model
stable-diffusion-v1-4	Link	Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python scripts/animate_lavie.py --config configs/lavie_configs/i4vgen_lavie.yaml

In configs/lavie_configs/i4vgen_lavie.yaml and ArgumentParser, arguments for inference:

ckpt_path: path to LaVie model, i.e., lavie_base
sd_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-4

Citation

@article{guo2024i4vgen,
    title   = {I4VGen: Image as Stepping Stone for Text-to-Video Generation},
    author  = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Huang, Di},
    journal = {arXiv preprint arXiv:2406.02230},
    year    = {2024}
}

Acknowledgments

The code is built upon AnimateDiff and LaVie, we thank all the contributors for open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
docs		docs
environments		environments
i4vgen		i4vgen
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I4VGen: Image as Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

Requirements

AnimateDiff

LaVie

Citation

Acknowledgments

About

Releases

Packages

Languages

License

xiefan-guo/i4vgen

Folders and files

Latest commit

History

Repository files navigation

I4VGen: Image as Stepping Stone for Text-to-Video GenerationOfficial PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230

Requirements

AnimateDiff

LaVie

Citation

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

I4VGen: Image as Stepping Stone for Text-to-Video Generation
_{Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230}

Packages