Skip to content

Latest commit

 

History

History
150 lines (113 loc) · 8.76 KB

README.md

File metadata and controls

150 lines (113 loc) · 8.76 KB

FreeDoM 🕊️ (ICCV 2023)

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Jiwen Yu1, Yinhuai Wang1, Chen Zhao2, Bernard Ghanem2, Jian Zhang1

1 Peking University, 2 KAUST

arXiv Camera Ready Paper Camera Ready Paper

News

  • News (2023-10-06): We successfully shared our work in Paris, thank you all for communicating with us! 😎
  • News (2023-08-17): We have released the main code. The details of ControlNet-related code can be found in ./CN while the details of the human face and guided diffusion-related code can be found in ./Face-GD
  • News (2023-07-16): We have released the code for FreeDoM-SD-Style, and you can find detailed information in the directory of ./SD_style
  • News (2023-07-14): 🎉🎉🎉 Congratulations on FreeDoM being accepted by ICCV 2023! Our open-source project is making progress, stay tuned for updates!

Todo

  • release the the camera-ready version of the paper and supplementary materials
  • release the code for human face diffusion models and guided diffusion with various training-free guidances
  • release the code for ControlNet with training-free face ID guidance and style guidance
  • release the code for Stable Diffusion with training-free style guidance

Introduction

FreeDoM is a simple but effective training-free method generating results under control from various conditions using unconditional diffusion models. Specifically, we use off-the-shelf pre-trained networks to construct the time-independent energy function, which measures the distance between the given conditions and the intermediately generated images. Then we compute the energy gradient and use it to guide the generation process. FreeDoM supports various conditions, including texts, segmentation maps, sketches, landmarks, face IDs, and style images. FreeDoM applies to different data domains, including human faces, images from ImageNet, and latent codes.

Overall Experimental Configurations

Model Source Data Domain Resolution Original Conditions Additional Training-free Conditions Sampling Time*(s/image)
SDEdit aligned human face $256\times256$ None parsing maps, sketches, landmarks, face IDs, texts ≈20s
guided-diffusion ImageNet $256\times256$ None texts, style images ≈140s
guided-diffusion ImageNet $256\times256$ class label style images ≈50s
Stable Diffusion general images $512\times512$(standard) texts style images ≈84s
ControlNet general images $512\times512$(standard) human poses, scribbles, texts face IDs, style images ≈120s

*The sampling time is tested on a GeForce RTX 3090 GPU card.

Results

Training-free style guidance + Stable Diffusion (click to expand)
Training-free style guidance + Scribble ControlNet (click to expand)
Training-free face ID guidance + Human-pose ControlNet (click to expand)
Training-free text guidance on human faces (click to expand)
Training-free segmentation guidance on human faces (click to expand)
Training-free sketch guidance on human faces (click to expand)
Training-free landmarks guidance on human faces (click to expand)
Training-free face ID guidance on human faces (click to expand)
Training-free face ID guidance + landmarks guidance on human faces (click to expand)
Training-free text guidance + segmentation guidance on human faces (click to expand)
Training-free style transferring guidance + Stable Diffusion (click to expand)
Training-free text-guided face editting (click to expand)

Acknowledgments

Our work is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:

We also introduce some recent works that shared similar ideas by updating the clean intermediate results $\mathbf{x}_{0|t}$:

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{yu2023freedom,
title={FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model},
author={Yu, Jiwen and Wang, Yinhuai and Zhao, Chen and Ghanem, Bernard and Zhang, Jian},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}