Yunxiang Zhang, Nan Wu, Connor Lin, Gordon Wetzstein, Qi Sun
Published in ACM Transactions on Applied Perception 2024
Presented at ACM Symposium on Applied Perception 2024 (Best Paper Award and Best Presentation Award)
[Paper] [Project Page] [Video]
Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process. Given a user-specified viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers’ attention toward the desired regions. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results evidence that both the cross-user eye gaze distributions and the saliency models’ predictions align with the desired attention distributions. Lastly, we outline several applications, including interactive design of saliency guidance, attention suppression in unwanted regions, and adaptive generation for varied display/viewing conditions.
- Create a dedicated Conda environment:
conda env create -f environment.yaml; conda activate gazefusion
; - Download the trained GazeFusion model from OneDrive and place it under the
models/
folder; - Place your custom saliency map files under the
smaps/
folder (or use a provided one); - Generate a few image samples with saliency guidance:
python generate.py --smap your_smap --prompt your_prompt
.
The code and data for training GazeFusion will be released soon, please stay tuned!
We would like to thank Saining Xie, Anyi Rao, and Zoya Bylinskii for fruitful early discussion, and the authors of Stable Diffusion, ControlNet, BLIP-2, EML-Net, and Text2Video-Zero for their great work, based on which GazeFusion was developed.
If you find this work useful to your research, please consider citing BibTeX:
@article{zhang2024gazefusion,
title={GazeFusion: Saliency-guided Image Generation},
author={Zhang, Yunxiang and Wu, Nan and Lin, Connor Z and Wetzstein, Gordon and Sun, Qi},
journal={ACM Transactions on Applied Perception},
year={2024},
publisher={ACM New York, NY}
}