GitHub - GuoLanqing/Self-Cascade: [ECCV2024] Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo*, Yingqing He*, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang,
Yong Zhang^#, Xintao Wang, Qifeng Chen, Ying Shan and Bihan Wen^#

(* first author, # corresponding author)

🥳 Demo

Please check more demo videos at the project page.

🔆 Abstract

TL; DR: 🤗🤗🤗 Self-cascade diffusion model is a lightweight and efficient scale adaptation approach for higher-resolution image and video generation.

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models for higher resolution demands substantial computational and optimization resources, yet achieving a generation capability comparable to low-resolution models remains elusive. This paper proposes a novel self-cascade diffusion model that leverages the rich knowledge gained from a well-trained low-resolution model for rapid adaptation to higher-resolution image and video generation, employing either tuning-free or cheap upsampler tuning paradigms. Integrating a sequence of multi-scale upsampler modules, the self-cascade diffusion model can efficiently adapt to a higher resolution, preserving the original composition and generation capabilities. We further propose a pivot-guided noise re-schedule strategy to speed up the inference process and improve local structural details. Compared to full fine-tuning, our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters. Extensive experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.

🔥 Update

2024.10.25 - 💥 Release training and testing codes on SDXL. Sorry for the late release 😢.
2024.7.3 - 💥 Accepted by ECCV 2024!

🔎 Main Requirements

This repository is tested on

Python==3.8
torch>=1.13.1
diffusers>=0.25.0
transformers
accelerate
xformers

💫 Inference

Text-to-image higher-resolution generation with diffusers script

stable-diffusion xl v1.0 base

For the tuning version, the pretrained model can be found in this link: https://huggingface.co/NothingSpecialSiri/SelfCascade-SDXL/tree/main

# 2048x2048 (4x) generation
python sdxl_inference_1000.py 
--trained_checkpoint_path \path\to\upsampler_only.pth 
--prompt_folder \path\to\captions 
--output_folder \path\to\output 
--file_list_path \path\to\test file names

😉 Citation

@article{guo2024make,
  title={Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation},
  author={Guo, Lanqing and He, Yingqing and Chen, Haoxin and Xia, Menghan and Cun, Xiaodong and Wang, Yufei and Huang, Siyu and Zhang, Yong and Wang, Xintao and Chen, Qifeng and others},
  journal={arXiv preprint arXiv:2402.10491},
  year={2024}
}

📭 Contact

If your have any comments or questions, feel free to contact Lanqing Guo or Yingqing He.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
config		config
docs		docs
source_override		source_override
README.md		README.md
__init__.py		__init__.py
clean_fid.py		clean_fid.py
sdxl_inference.py		sdxl_inference.py
sdxl_inference_1000.py		sdxl_inference_1000.py
sdxl_main.py		sdxl_main.py
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

🥳 Demo

🔆 Abstract

🔥 Update

🔎 Main Requirements

💫 Inference

Text-to-image higher-resolution generation with diffusers script

stable-diffusion xl v1.0 base

😉 Citation

📭 Contact

About

Releases

Packages

Contributors 2

Languages

GuoLanqing/Self-Cascade

Folders and files

Latest commit

History

Repository files navigation

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

🥳 Demo

🔆 Abstract

🔥 Update

🔎 Main Requirements

💫 Inference

Text-to-image higher-resolution generation with diffusers script

stable-diffusion xl v1.0 base

😉 Citation

📭 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages