Skip to content

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Notifications You must be signed in to change notification settings

zjr2000/REVERIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Introduction

we propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning. Unlike previous methods that learning from responses only, our approach entails the model predicting rationales justifying why responses are correct or incorrect. This fosters a deeper engagement with the fine-grained reasoning underlying each response, thus enhancing the models reasoning proficiency. To facilitate this approach, we propose REVERIE, the first large-scale instruction-tuning dataset with ReflEctiVE RatIonalE annotations. REVERIE comprises 115k machine-generated reasoning instructions, each meticulously annotated with a corresponding pair of correct and confusing responses, alongside comprehensive rationales elucidating the justification behind the correctness or erroneousness of each response.

Reflective Instruction Tuning

The REVERIE Dataset

Download

  1. Download images:
  2. Download annotations from huggingface

Generation

We provide scripts for reflective annotation generation. Follow these steps to generate the data::

  1. Prepare image_list.json: This file should contain a list of image names and be placed under $TARGET_FOLDER.
  2. Generate instruction and response:
python dataset_generation_pipeline/gemini_v_qa_collection.py \
    --image_folder $IMAGE_FOLDER \
    --target_folder $TARGET_FOLDER \
    --api_key $API_KEY \
    --num_tasks $NUM_TASKS
  1. Generate positive and negative rationales:
python dataset_generation_pipeline/gemini_v_rationale_collection.py \
    --target_folder $TARGET_FOLDER \
    --api_key $API_KEY \
    --num_tasks $NUM_TASKS
  1. Consistency-based data filtering:
python dataset_generation_pipeline/data_filter.py \
    --target_folder $TARGET_FOLDER \
    --api_key $API_KEY \
    --num_tasks $NUM_TASKS

The resulting rationale_instruct_data_with_judge.json will contain the final generated data, including instructions, responses, rationales, and flags indicating their correctness.

Models

We perform reflective instruction tuning on several open-sourced LVLMs. Please refer to their repositories for instructions on setting up the environments and running the models:

Checkpoints and Results:

We also provide checkpoints and prediction results to facilitate the reproduction of our results:

Checkpoints:

Model data Baseline Checkpoints
REVERIE-1.0-7b-lora LLaVA-Instruct-80K + REVERIE LLaVA-1.0-7b-lora Baidu Disk Link
REVERIE-1.5-7b-lora LLaVA-665k + REVERIE LLaVA-1.5-7b-lora Baidu Disk Link
MOE-REVERIE-1.6Bx4 LLaVA-665k + REVERIE MOE-LLaVA-1.6Bx4 Baidu Disk Link
REVERIE-Phi3-lora LLaVA-665k + REVERIE LLaVA-Phi3-lora Baidu Disk Link

Prediction Files:

Benchmark Results
ScienceQA Baidu Disk Link
MMBench Baidu Disk Link
POPE Baidu Disk Link

Citation

If you find this repo helpful, please consider citing:

@article{zhang2024reflective,
  title={Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models},
  author={Zhang, Jinrui and Wang, Teng and Zhang, Haigang and Lu, Ping and Zheng, Feng},
  journal={arXiv preprint arXiv:2407.11422},
  year={2024}
}

Acknowledgements

Our experiments are conducted on several awesome open-soured LVLMs: LLaVA, MoE-LLaVA and LLaVA-phi3. We thank the authors for their efforts.

About

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages