Mitigating Object Hallucination via Concentric Causal Attention

Mitigating Object Hallucination via Concentric Causal Attention

This is the official repository of the following paper and a project that study positional perception in LVLMs.

Mitigating Object Hallucination via Concentric Causal Attention
NeurIPS 2024
Yun Xing*, Yiheng Li*, Ivan Laptev, Shijian Lu†

🎉 News

We will include more findings in coming weeks. Stay tuned if you are interested. 🙏🙏

[2024/10/22] Paper is available on arXiv.
[2024/10/21] CCA-LLaVA supports evaluation of multiple benchmarks, including pope, chair, amber for hallucination, and mmstar, gqa, seed, vizwiz_vqa, scienceqa for general LVLM multiple-choice questions. Please refer to this doc for details.
[2024/09/27] CCA is accepted to NeurIPS 2024🎉.

🕹️ Approach

We reveal that object hallucination is closely tied with Rotary Position Encoding (RoPE), a widely adopted positional dependency modeling design in existing LVLMs. Due to the long-term decay in RoPE, LVLMs suffer from recency bias and tend to hallucinate more when relevant visual cues are distant from instruction tokens (user query) in the multimodal input sequence.
Motivated by this, we propose Concentric Causal Attention (CCA), a simple yet effective positional alignment strategy that mitigates the impact of RoPE long-term decay in LVLMs by placing critical visual cues closer to user instructions, thereby alleviating object hallucinations.

🔥 Spatial Position Probing

To further verify effectiveness of our approach, we craft a large-scale object hallucination evaluation set, involving over 2,000,000 testing samples that are diverse in object spatial positions and object sizes. Our model surpasses LLaVA-1.5 across diverse spatial positions and object scales consistently.

🛠️ Install

conda create -n cca-llava python=3.10 -y
conda activate cca-llava
pip install --upgrade pip  # enable PEP 660 support
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install -e .
pip install -e ".[train]"
pip install triton==2.1.0 pynvml==11.5.0 --upgrade
pip install flash-attn==2.5.8 --no-build-isolation --no-cache-dir

🤗 Model

cca-llava-1.5-7b

📜 Data

Please refer to Data.md for preparation of training data.

🌟 Train

CCA-LLaVA training pipeline follows LLaVA-1.5. The training consists of two stages:

Step 1, pretraining. Train a projector on a CC3M subset of ∼558K image-text pairs to connect a frozen pretrained vision encoder and a frozen LLM.
```
bash scripts/v1_5/pretrain.cca-llava-1.5-7b.sh
```
Step 2, instruction tuning. Fine-tune projector and LLM with ~665k multimodal instruction data.
```
bash scripts/v1_5/finetune.cca-llava-1.5-7b.sh
```

🔍 Eval

Please refer to Eval.md for details.

🕹️ Usage

The two core modifications concentric positions and concentric causal masking can be found in llava/cca_utils folder. To replace default causal scheme with our proposed cca, you can prepend following code to either training or evaluation code, subject to your own use case.

import transformers
from llava.cca_utils.cca import llamaforcausallm_forward, cca_forward 
transformers.models.llama.LlamaForCausalLM.forward = llamaforcausallm_forward
transformers.models.llama.LlamaModel.forward = cca_forward

✒️ Citation

@article{xing2024mitigating,
  title={Mitigating Object Hallucination via Concentric Causal Attention},
  author={Xing, Yun and Li, Yiheng and Laptev, Ivan and Lu, Shijian},
  journal={arXiv preprint arXiv:2410.15926},
  year={2024}
}

❤️ Acknowledgement

Thanks for their wonderful work!

LLaVA: the codebase we use to implement cca.
roformer: codebase where rope is initially proposed.
OPERA: an excellent approach that mitigates object hallucination. the codebase we use to implement CHAIR evaluation.
POPE: a widely adopted object hallucination benchmark.
AMBER: a recent comprehensive hallucination benchmark involving object, attribute and relation hallucination.
lmms-eval: a comprehensive evaluation toolkit on LVLMs. the codebase we use to implement general LVLM benchmark evaluations.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
docs		docs
images		images
llava		llava
lmms-eval		lmms-eval
outputs/chair		outputs/chair
playground/data		playground/data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitigating Object Hallucination via Concentric Causal Attention

🎉 News

We will include more findings in coming weeks. Stay tuned if you are interested. 🙏🙏

🕹️ Approach

🔥 Spatial Position Probing

🛠️ Install

🤗 Model

📜 Data

🌟 Train

🔍 Eval

🕹️ Usage

✒️ Citation

❤️ Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

xing0047/cca-llava

Folders and files

Latest commit

History

Repository files navigation

Mitigating Object Hallucination via Concentric Causal Attention

🎉 News

We will include more findings in coming weeks. Stay tuned if you are interested. 🙏🙏

🕹️ Approach

🔥 Spatial Position Probing

🛠️ Install

🤗 Model

📜 Data

🌟 Train

🔍 Eval

🕹️ Usage

✒️ Citation

❤️ Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages