Skip to content
/ ToMe Public

[NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Notifications You must be signed in to change notification settings

hutaiHang/ToMe

Repository files navigation

🌟 [NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

📑 Introduction

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Taihang Hu, Linxuan Li, Joost van de Weijer, Hongcheng Gao, Fahad Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, Yaxing Wang

📚arXiv

This paper defines semantic binding as the task of associating an object with its attribute (attribute binding) or linking it to related sub-objects (object binding). We propose a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token, aligning the object, its attributes, and sub-objects in the same cross-attention map.

For technical details, please refer to our paper.

🚀 Usage

  1. Environment Setup

    Create and activate the Conda virtual environment:

    conda env create -f environment.yaml
    conda activate tome

    Alternatively, install dependencies via pip:

    pip install -r requirements.txt

    Additionally, download the SpaCy model for syntax parsing:

    python -m spacy download en_core_web_trf
  2. Configure Parameters

    Modify the configs/demo_config.py file to adjust runtime parameters as needed. This file includes two example configuration classes: RunConfig1 for object binding and RunConfig2 for attribute binding. Key parameters are as follows:

    • prompt: Text prompt for guiding image generation.
    • model_path: Path to the Stable Diffusion model; set to None to download the pretrained model automatically.
    • use_nlp: Whether to use an NLP model for token parsing.
    • token_indices: Indices of tokens to merge.
    • prompt_anchor: Split text prompt.
    • prompt_merged: Text prompt after token merging.
    • For further parameter details, please refer to the comments in the configuration file and our paper.
  3. Run the Example

    Execute the main script run_demo.py:

    python run_demo.py

    The generated images will be saved in the demo directory.

📸 Example Outputs

If everything is set up correctly, RunConfig1 and RunConfig2 should produce the left and right images below, respectively:

⚠️ Notes

  • Custom Configurations: To use custom text prompts and parameters, add a new configuration class in configs/demo_config.py and make necessary adjustments in run_demo.py.
  • Parameter Sensitivity: This method inherits the sensitivity of inference-based optimization techniques, meaning that the generated results are highly dependent on hyperparameter settings. Careful tuning may be required to achieve optimal results.
  • NLP Models: When using NLP models like SpaCy for token parsing, ensure the correct language model is installed.

🙏 Acknowledgments

This project builds upon valuable work and resources from the following repositories:

We extend our sincere thanks to the creators of these projects for their contributions to the field and for making their code available. 🙌

BibTeX

@inproceedings{hu2024token,
  title={Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis},
  author={Taihang Hu and Linxuan Li and Joost van de Weijer and Hongcheng Gao and Fahad Khan and Jian Yang and Ming-Ming Cheng and Kai Wang and Yaxing Wang},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=tRRWoa9e80}
}

About

[NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages