Skip to content

This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge"

License

Notifications You must be signed in to change notification settings

HUANGLIZI/VisionUnite

Repository files navigation

VisionUnite

This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge" Arxiv. The dataset we use for fine-tuning is the MMFundus dataset.

image (a) Previous vision models could only diagnose specific diseases as positive or negative, lacking the ability to provide clinical explanations or interact with patients. However, our proposed VisionUnite changes this approach. It can predict a wide range of diseases and allows real-time conversations with patients, incorporating their feedback. Additionally, VisionUnite offers clear clinical explanations in its output, making it more understandable and useful. (b) The label distribution of the proposed MMFundus dataset, which includes eight main categories excluding the "Others" class. (c) VisionUnite is built with a transformer-based vision encoder and a specialized vision adapter designed for classifying six different signs including Vascular, Macular, FBC (Fundus Boundary Color), OCD (Optical Cup Disc), FHE (Fundus Hemorrhages Examination), and Other. It includes a vision projector to align visual embeddings with text tokens. (d) The illustration of image-text contrastive learning (CLIP Loss). (e) The illustration of classification supervised learning (CLS Loss). (f) The illustration of text-generation supervised learning (LLM Loss).

Requirements

Python == 3.8 and install from the requirements.txt using:

pip install -r requirements.txt

Usage

1. Training

You can train to get your own model.

bash ./exps/train.sh

2. Evaluation

2.1 Test the Model

Prepare the test data and run the following command

python demo.py

2.2 Pre-trained models

The pre-train model VisioinUnite V1 can be downloaded at the link.

If you use the pre-train model provided by us, please cite the VisionUnite.

To obtain further pre-trained models for the MMFundus dataset, you can contact the email address zhanli@uw.edu. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:

Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)

Our code is adapted from LLaMA-Adapter and InternVL. Thanks to these authors for their valuable works.

Citation

@article{li2024visionunite,
  title={VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge},
  author={Li, Zihan and Song, Diping and Yang, Zefeng and Wang, Deming and Li, Fei and Zhang, Xiulan and Kinahan, Paul E and Qiao, Yu},
  journal={arXiv preprint arXiv:2408.02865},
  year={2024}
}

About

This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published