[EMNLP24] Self-Training Large Language and Vision Assistant for Medical

The advancement of medical image understanding and reasoning critically depends on building high-quality visual instruction data, which is costly and labor-intensive to obtain, particularly in the medical domain. To mitigate this data-starving issue, we introduce Self-Training Large Language and Vision Assistant for Medicine (STLLaVA-Med).

Self-Training Large Language and Vision Assistant for Medical Question-Answering [paper][HF Model]

Guohao Sun, Can Qin, Huazhu Fu, Linwei Wang, Zhiqiang Tao

Medical data usage and performance comparision between LLaVA-Med and our method.

Self-training pipeline for transforming a general Vision-Language assistant to medical expert.

🔥 News

2024.10.24 🌟 We have released our checkpoints!
2024.09.20 🌟 We will release our checkpoints soon!
2024.09.20 🌟 Our paper has been accepted by EMNLP 2024 (main conference).
2024.06.10 🌟 Our paper and code was released!

Install

Install Package

conda create -n stllava python=3.10 -y
conda activate stllava
pip install --upgrade pip  # enable PEP 660 support
cd STLLaVA-Med
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Data

Visual instructional data

This project utilizes vision instructional data provided by LLaVA-Med 60k_inline_mention. However, due to disabled image URL, we fillterd out the origional data to ours own version in this project.

DPO data

DPO data example.

This project auto-generate the preference dataset using the model itself and guided by GPT-4o. We sample 10k medical images from PMC-15M. You may download the dataset via STLLaVA-Med-DPO.

Traininig

Training consists of two stages: (1) visual self-questioning instruction tuning stage, teaching the model to ask questions and follow multimodal instructions; (2) preference optimization.

Instruction tuning:

Training script with DeepSpeed ZeRO-3 and lora: sqllava_med.sh.

--mm_projector_type cluster: the prototype extractor & a two-layer MLP vision-language connector.
--mm_projector_type mlp2x_gelu: a two-layer MLP vision-language connector.
--vision_tower openai/clip-vit-large-patch14-336: CLIP ViT-L/14 336px.
--image_aspect_ratio pad: this pads the non-square images to square, instead of cropping them; it slightly reduces hallucination.
--version v1_sq: training for visual self-questioning.
--vit_lora_enable: optimize vision encoder using vit lora.

Preference optimization:

Training script with DeepSpeed ZeRO-3 and lora: dpo_finetune.sh.

--version v1: training for visual self-questioning.

Evaluation

Please download raw images of datasets (VQA-RAD, SLAKE, PVQA) for medical VQA tasks.

Evaluate models on a diverse set of 3 benchmarks. To ensure the reproducibility, we evaluate the models with greedy decoding. We do not evaluate using beam search to make the inference process consistent with the chat demo of real-time outputs.

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{Sun2024STLLaVAMedSL,
  title={STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical},
  author={Guohao Sun and Can Qin and Huazhu Fu and Linwei Wang and Zhiqiang Tao},
  booktitle = {EMNLP},
  year={2024},
}

Acknowledgement

SQ-LLaVA: the codebase we built upon.
LLaVA.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
images		images
llava		llava
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_dpo_trainer.py		base_dpo_trainer.py
dpo_finetune.sh		dpo_finetune.sh
llama_flash_attn_monkey_patch.py		llama_flash_attn_monkey_patch.py
llava_dpo_trainer.py		llava_dpo_trainer.py
llava_trainer.py		llava_trainer.py
pyproject.toml		pyproject.toml
sqllava_med.sh		sqllava_med.sh
train.py		train.py
train_dpo.py		train_dpo.py
train_mem.py		train_mem.py
train_mem_dpo.py		train_mem_dpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[EMNLP24] Self-Training Large Language and Vision Assistant for Medical

🔥 News

Contents

Install

Data

Traininig

Instruction tuning:

Preference optimization:

Evaluation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

heliossun/STLLaVA-Med

Folders and files

Latest commit

History

Repository files navigation

[EMNLP24] Self-Training Large Language and Vision Assistant for Medical

🔥 News

Contents

Install

Data

Traininig

Instruction tuning:

Preference optimization:

Evaluation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages