Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"
project_teaser.mp4
VideoElevator aims to elevate the quality of generated videos with text-to-image diffusion models. It is training-free and plug-and-play to support cooperation of various text-to-video and text-to-image diffusion models.
- [04/07/2024] We release the code of VideoElevator, including three example scripts.
Bottom: VideoElevator explicitly decompose each step into temporal motion refining and spatial quality elevating, where the former encapsulates T2V to enhance temporal consistency and the latter harnesses T2I to provide more faithful details, e.g., dressed in suit. Empirically, applying T2V in several timesteps is enough to ensure temporal consistency.
All pre-trained weights are downloaded to checkpoints/
directory, including the pre-trained weights of text-to-video and text-to-image diffusion models. Users can download the corresponding weights according to their needs.
- Text-to-video diffusion models: LaVie, ZeroScope, AnimateLCM.
- Text-to-image diffusion models: StableDiffusion v1.5, StableDiffusion v2.1-base.
- [Optional] LoRA from Civitai: RCNZ Cartoon, RealisticVision, Lyriel, ToonYou.
conda create -n videoelevator python=3.10
conda activate videoelevator
pip install -r requirements.txt
We provide three example scripts of VideoElevator in example_scripts/
directory, and recommend to run example_scripts/sd_animatelcm.py
. To perform improved text-to-video generation, directly run command python example_scripts/sd_animatelcm.py
.
Notably, all scripts can run with less than 11 GB VRAM (e.g., 2080Ti GPU).
[Optional] Hyper-parameters
You can define the following hyper-parameters, and check their effects in Ablation studies of project page:
- stable_steps: the choice of timestep in temporal motion refining.
- stable_num: the number of steps used in T2V denoising.
@article{zhang2024videoelevator,
title={VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models},
author={Zhang, Yabo and Wei, Yuxiang and Lin, Xianhui and Hui, Zheng and Ren, Peiran and Xie, Xuansong and Ji, Xiangyang and Zuo, Wangmeng},
journal={arXiv preprint arXiv:2403.05438},
year={2024}
}
This repository borrows code from Diffusers, LaVie, AnimateLCM, and FreeInit. Thanks for their contributions!