GitHub - NYU-ICL/saliency-guided-image-generation: GazeFusion: Saliency-guided Image Generation (ACM Transactions on Applied Perception 2024)

GazeFusion: Saliency-guided Image Generation

Yunxiang Zhang, Nan Wu, Connor Lin, Gordon Wetzstein, Qi Sun
Published in ACM Transactions on Applied Perception 2024
Presented at ACM Symposium on Applied Perception 2024 (Best Paper Award and Best Presentation Award)
[Paper] [Project Page] [Video]

Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image generation in practical applications, we present a saliency-guided framework to incorporate the data priors of human visual attention mechanisms into the generation process. Given a user-specified viewer attention distribution, our control module conditions a diffusion model to generate images that attract viewers’ attention toward the desired regions. To assess the efficacy of our approach, we performed an eye-tracked user study and a large-scale model-based saliency analysis. The results evidence that both the cross-user eye gaze distributions and the saliency models’ predictions align with the desired attention distributions. Lastly, we outline several applications, including interactive design of saliency guidance, attention suppression in unwanted regions, and adaptive generation for varied display/viewing conditions.

Inference

Create a dedicated Conda environment: conda env create -f environment.yaml; conda activate gazefusion;
Download the trained GazeFusion model from OneDrive and place it under the models/ folder;
Place your custom saliency map files under the smaps/ folder (or use a provided one);
Generate a few image samples with saliency guidance: python generate.py --smap your_smap --prompt your_prompt.

Training

The code and data for training GazeFusion will be released soon, please stay tuned!

Acknowledgements

We would like to thank Saining Xie, Anyi Rao, and Zoya Bylinskii for fruitful early discussion, and the authors of Stable Diffusion, ControlNet, BLIP-2, EML-Net, and Text2Video-Zero for their great work, based on which GazeFusion was developed.

Citation

If you find this work useful to your research, please consider citing BibTeX:

@article{zhang2024gazefusion,
  title={GazeFusion: Saliency-guided Image Generation},
  author={Zhang, Yunxiang and Wu, Nan and Lin, Connor Z and Wetzstein, Gordon and Sun, Qi},
  journal={ACM Transactions on Applied Perception},
  year={2024},
  publisher={ACM New York, NY}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cldm		cldm
docs		docs
font		font
ldm		ldm
models		models
smaps		smaps
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
generate.py		generate.py
share.py		share.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GazeFusion: Saliency-guided Image Generation

Inference

Training

Acknowledgements

Citation

About

Releases

Packages

Languages

License

NYU-ICL/saliency-guided-image-generation

Folders and files

Latest commit

History

Repository files navigation

GazeFusion: Saliency-guided Image Generation

Inference

Training

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages