Text-to-Image Rectified Flow as Plug-and-Play Priors

by Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin.

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from the source to the target distribution and has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass in terms of generation quality and efficiency, requiring fewer inference steps. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models — they can also serve as effective priors. Besides the generative capabilities of diffusion priors, motivated by the unique time-symmetry properties of rectified flow models, a variant of our method can additionally perform image inversion. Experimentally, our rectified flow-based priors outperform their diffusion counterparts — the SDS and VSD losses — in text-to-3D generation. Our method also displays competitive performance in image inversion and editing.

Updates

2024/06/05: Code release.
2024/06/21: Add support for Stable Diffusion 3 (June, Medium version).
2024/10/08: We extend the paper with the Stochastic Interpolants framework. In addition to the rectified flow models, the new theory can also be applied to other flow-matching based methods and diffusion models expressed in PF-ODE. An updated version has been uploaded to arxiv. We also optimized the codes to help understanding.

ToDo

Code release. The base text-to-image model is based on InstaFlow.
Add support for Stable Diffusion 3 after the model is released.
Support Flux, the SOTA text-to-image model
Stability AI will release "a much improved version" of SD3 soon (refer to here). We'll add support for the new version ASAP.

Installation

Our codes are based on the implementations of ThreeStudio. Please follow the instructions in ThreeStudio to install the dependencies.

To use SD3: please follow the instructions here to login to huggingface and update diffusers. When you run our codes, the models will be automatically downloaded.

Quickstart

Using Stable Diffusion 3 as the base rectified-flow model.

2D Playground (SD3)

# run RFDS in 2D space for image generation
python 2dplayground_RFDS_sd3.py

# run RFDS-Rev in 2D space for image generation
python 2dplayground_RFDS_Rev_sd3.py

# run iRFDS in 2D space for image editing (requires 20g GPU memory)
python 2dplayground_iRFDS_sd3.py

Text-to-3D with RFDS (SD3) (requires 46g GPU memory)

python launch.py --config configs/rfds_sd3.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger"

Text-to-3D with RFDS-Rev (SD3) (requires >46g GPU memory)

python launch.py --config configs/rfds-rev_sd3.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger"

Text-to-3D with RFDS-Rev, reduced memory usage (SD3) (able to run on 46g GPUs)

python launch.py --config configs/rfds-rev_sd3_low_memory.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger"

Results

Optimization in 2D space (SD3)

Caption: A DSLR image of a hamburger

RFDS

RFDS-Rev

Text-to-3D with RFDS (NeRF backbone, SD3)

A DSLR image of a hamburger

A 3d model of an adorable cottage with a thatched roof

Text-to-3D with RFDS-Rev (NeRF backbone, SD3)

A DSLR image of a hamburger

A 3d model of an adorable cottage with a thatched roof

Text guided editing with iRFDS (SD3)

Remarks for SD3

In SD3, the RFDS baseline already delivers great results. If your GPU memory is limited, it's recommended to use the RFDS baseline version.
SD3 is not trained with reflow (check out the InstaFlow paper for more on that). So, we found it a bit tougher to do image inversion using iRFDS and SD3. Additionally, the transformer backbone makes it difficult to replace objects with text control without using prompt-to-prompt.

Using InstaFlow as the base rectified-flow model (use less GPU memory).

2D Playground (InstaFlow)

# run RFDS in 2D space for image generation
python 2dplayground_RFDS.py

# run RFDS-Rev in 2D space for image generation
python 2dplayground_RFDS_Rev.py

# run iRFDS in 2D space for image editing
python 2dplayground_iRFDS.py

Text-to-3D with RFDS (InstaFlow)

python launch.py --config configs/rfds.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger"

Text-to-3D with RFDS-Rev (InstaFlow)

python launch.py --config configs/rfds-rev.yaml --train --gpu 0 system.prompt_processor.prompt="A DSLR photo of a hamburger"

Results

Optimization in 2D space (InstaFlow)

Caption: an astronaut is riding a horse

RFDS

RFDS-Rev

Text-to-3D with RFDS-Rev (NeRF backbone, InstaFlow)

A DSLR image of a hamburger

An intricate ceramic vase with peonies painted on it

Text guided editing with iRFDS (InstaFlow)

Credits

RFDS is built on the following open-source projects:

ThreeStudio Main Framework
InstaFlow Large-scale text-to-image Rectified Flow model

Citation

@article{yang2024rfds,
  title={Text-to-Image Rectified Flow as Plug-and-Play Priors},
  author={Xiaofeng Yang and Cheng Chen and Xulei Yang and Fayao Liu and Guosheng Lin},
  journal={arXiv-2406.03293},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
data_assets		data_assets
images		images
load		load
threestudio		threestudio
2dplayground_RFDS.py		2dplayground_RFDS.py
2dplayground_RFDS_Rev.py		2dplayground_RFDS_Rev.py
2dplayground_RFDS_Rev_sd3.py		2dplayground_RFDS_Rev_sd3.py
2dplayground_RFDS_sd3.py		2dplayground_RFDS_sd3.py
2dplayground_iRFDS.py		2dplayground_iRFDS.py
2dplayground_iRFDS_sd3.py		2dplayground_iRFDS_sd3.py
README.md		README.md
Rectified_Flow_Time_Inversion.py		Rectified_Flow_Time_Inversion.py
launch.py		launch.py
pipeline_rf.py		pipeline_rf.py
pipeline_rf_inverse.py		pipeline_rf_inverse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-Image Rectified Flow as Plug-and-Play Priors

Updates

ToDo

Installation

Quickstart

2D Playground (SD3)

Text-to-3D with RFDS (SD3) (requires 46g GPU memory)

Text-to-3D with RFDS-Rev (SD3) (requires >46g GPU memory)

Text-to-3D with RFDS-Rev, reduced memory usage (SD3) (able to run on 46g GPUs)

Results

Optimization in 2D space (SD3)

Text-to-3D with RFDS (NeRF backbone, SD3)

Text-to-3D with RFDS-Rev (NeRF backbone, SD3)

Text guided editing with iRFDS (SD3)

Remarks for SD3

2D Playground (InstaFlow)

Text-to-3D with RFDS (InstaFlow)

Text-to-3D with RFDS-Rev (InstaFlow)

Results

Optimization in 2D space (InstaFlow)

Text-to-3D with RFDS-Rev (NeRF backbone, InstaFlow)

Text guided editing with iRFDS (InstaFlow)

Credits

Citation

About

Releases

Packages

Languages

yangxiaofeng/rectified_flow_prior

Folders and files

Latest commit

History

Repository files navigation

Text-to-Image Rectified Flow as Plug-and-Play Priors

Updates

ToDo

Installation

Quickstart

2D Playground (SD3)

Text-to-3D with RFDS (SD3) (requires 46g GPU memory)

Text-to-3D with RFDS-Rev (SD3) (requires >46g GPU memory)

Text-to-3D with RFDS-Rev, reduced memory usage (SD3) (able to run on 46g GPUs)

Results

Optimization in 2D space (SD3)

Text-to-3D with RFDS (NeRF backbone, SD3)

Text-to-3D with RFDS-Rev (NeRF backbone, SD3)

Text guided editing with iRFDS (SD3)

Remarks for SD3

2D Playground (InstaFlow)

Text-to-3D with RFDS (InstaFlow)

Text-to-3D with RFDS-Rev (InstaFlow)

Results

Optimization in 2D space (InstaFlow)

Text-to-3D with RFDS-Rev (NeRF backbone, InstaFlow)

Text guided editing with iRFDS (InstaFlow)

Credits

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages