Skip to content

[arXiv 2024] I4VGen: Image as Stepping Stone for Text-to-Video Generation

License

Notifications You must be signed in to change notification settings

xiefan-guo/i4vgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

I4VGen: Image as Stepping Stone for Text-to-Video Generation
Official PyTorch implementation of the arXiv 2024 paper: https://arxiv.org/abs/2406.02230

I4VGen

I4VGen: Image as Stepping Stone for Text-to-Video Generation
Xiefan Guo, Jinlin Liu, Miaomiao Cui, Di Huang
https://xiefan-guo.github.io/i4vgen

Abstract: I4VGen is a training-free and plug-and-play video diffusion inference framework, which decomposes text-to-video generation into two stages: anchor image synthesis and anchor image-guided video synthesis. It employs a generation-selection strategy for the anchor image, synthesizing candidate images and selecting the most appropriate one based on a reward mechanism to ensure close alignment with the text prompt. Subsequently, a novel Noise-Invariant Video Score Distillation Sampling is developed to animate the image to a video, followed by a video regeneration process to refine that, thereby significantly improving the video quality.

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • All experiments are conducted on a single NVIDIA V100 GPU (32 GB).

AnimateDiff

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your AnimateDiff Python environment:

# Create conda environment
conda env create -f environments/animatediff_environment.yaml
# Activate conda environment
conda activate animatediff_env

Inference setup: Please refer to the official repo of AnimateDiff. The setup guide is listed here. mm-sd-v15-v2 and stable-diffusion-v1-5 are used in our experiments.

Name HuggingFace Type
mm-sd-v15-v2 Link Motion module
stable-diffusion-v1-5 Link Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python -m scripts.animate_animatediff --config configs/animatediff_configs/i4vgen_animatediff.yaml

In configs/animatediff_configs/i4vgen_animatediff.yaml and ArgumentParser, arguments for inference:

  • motion_module: path to motion module, i.e., mm-sd-v15-v2 motion module
  • pretrained_model_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-5

LaVie

Python libraries: See environment.yml for exact library dependencies. You can use the following commands to create and activate your LaVie Python environment:

# Create LaVie conda environment
conda env create -f environments/lavie_environment.yaml
# Activate LaVie conda environment
conda activate lavie_env

Inference setup: Please refer to the official repo of LaVie. The base-version is employed in our experiments. Download pre-trained lavie_base and stable-diffusion-v1-4.

Name HuggingFace Type
lavie_base Link LaVie model
stable-diffusion-v1-4 Link Base T2I diffusion model

Generating videos: Before generating the video, please make sure you have set up the required Python environment and downloaded the corresponding checkpoints. Run the following command to generate the video.

python scripts/animate_lavie.py --config configs/lavie_configs/i4vgen_lavie.yaml

In configs/lavie_configs/i4vgen_lavie.yaml and ArgumentParser, arguments for inference:

  • ckpt_path: path to LaVie model, i.e., lavie_base
  • sd_path: path to base T2I diffusion model, i.e., stable-diffusion-v1-4

Citation

@article{guo2024i4vgen,
    title   = {I4VGen: Image as Stepping Stone for Text-to-Video Generation},
    author  = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Huang, Di},
    journal = {arXiv preprint arXiv:2406.02230},
    year    = {2024}
}

Acknowledgments

The code is built upon AnimateDiff and LaVie, we thank all the contributors for open-sourcing.

About

[arXiv 2024] I4VGen: Image as Stepping Stone for Text-to-Video Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages