Skip to content

A list of VLMs tailored for medical RG and VQA; and a list of medical vision-language datasets

Notifications You must be signed in to change notification settings

lab-rasool/Awesome-Medical-VLMs-and-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Awesome Vision-Language Models (VLMs) for Medical Report Generation (RG) and Visual Question Answering (VQA)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review is the comprehensive review that includes:

  • the latest publicly available VLMs specifically designed for medical RG and VQA;
  • the essential background on computer vision, natural language processing, and VLMs to ensure its accessibility for readers without a machine learning background;
  • the description of publicly available vision-language datasets, encompassing medical image-text pairs or question-answer pairs related to medical images;
  • the detailed description of metrics employed for evaluating VLMs on RG and VQA tasks;
  • the discussion of current challenges in the field and various potential research directions that could significantly shape the future of medical VLMs.

The list of medical VLMs

Medical VLM VQA RG Paper Code Year
MedViLL + + Moon et al. GitHub 2021
PubMedCLIP + - Eslami et al. GitHub 2021
RepsNet + + Tanwani et al. on request at Site ? 2022
BiomedCLIP + - Zhang et al. Hugging Face 2023
UniXGen - + Lee et al. GitHub 2023
RAMM + - Yuan et al. GitHub 2023
X-REM - + Jeong et al. GitHub 2023
Visual Med-Alpaca + - - GitHub 2023
CXR-RePaiR-Gen - + Ranjit et al. - 2023
LLaVa-Med + - Li et al. GitHub 2023
XrayGPT + + Thawkar et al. GitHub 2023
CAT-ViL DeiT + - Bai et al. GitHub 2023
MUMC + - Li et al. GitHub 2023
Med-Flamingo + - Moor et al. GitHub 2023
RaDialog + + Pellegrini et al. GitHub 2023
PathChat + - Lu et al. GitHub 2024

The list of Medical Vision-Language Datasets

Medical Dataset Image-Text pairs QA pairs Paper Link
ROCO + - Pelka et al. GitHub
MIMIC-CXR + - Johnson et al. PhysioNet
MIMIC-CXR-JPG + - Johnson et al. PhysioNet
MIMIC-NLE + - Kayser et al. GitHub
CXR-PRO + (unpaired) - Ramesh et al. PhysioNet
MS-CXR + - Boecking et al. PhysioNet
IU-Xray or Open-I + - Demner-Fushman et al. Openi
MedICaT + - Subramanian et al. GitHub
PMC-OA + - Lin et al. Hugging Face
SLAKE - + Liu et al. MedVQA
VQA-RAD - + Lau et al. Osf
PathVQA - + He et al. GitHub
VQA-Med 2019 - + Abacha et al. GitHub
VQA-Med 2020 - + Abacha et al. GitHub
VQA-Med 2021 - + Ionescu et al. GitHub
EndoVis 2017 - + Allan et al. GitHub
EndoVis 2018 - + Allan et al. image frames in Challenge and the rest on GitHub
PathQABench-Public - + Lu et al. GitHub

Citation

@article{Hartsock2024,
  title={Vision-language models for medical report generation and visual question answering: a review},
  author={Hartsock, Iryna and Rasool, Ghulam},
  journal={Frontiers in Artificial Intelligence},
  volume={7},
  pages={1430984},
  year={2024},
  publisher={Frontiers Media SA},
  doi={10.3389/frai.2024.1430984},
  url={https://www.frontiersin.org/articles/10.3389/frai.2024.1430984/full}
}

About

A list of VLMs tailored for medical RG and VQA; and a list of medical vision-language datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published