Awesome Vision-Language Models (VLMs) for Medical Report Generation (RG) and Visual Question Answering (VQA)
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review is the comprehensive review that includes:
- the latest publicly available VLMs specifically designed for medical RG and VQA;
- the essential background on computer vision, natural language processing, and VLMs to ensure its accessibility for readers without a machine learning background;
- the description of publicly available vision-language datasets, encompassing medical image-text pairs or question-answer pairs related to medical images;
- the detailed description of metrics employed for evaluating VLMs on RG and VQA tasks;
- the discussion of current challenges in the field and various potential research directions that could significantly shape the future of medical VLMs.
Medical VLM | VQA | RG | Paper | Code | Year |
---|---|---|---|---|---|
MedViLL | + | + | Moon et al. | GitHub | 2021 |
PubMedCLIP | + | - | Eslami et al. | GitHub | 2021 |
RepsNet | + | + | Tanwani et al. | on request at Site ? | 2022 |
BiomedCLIP | + | - | Zhang et al. | Hugging Face | 2023 |
UniXGen | - | + | Lee et al. | GitHub | 2023 |
RAMM | + | - | Yuan et al. | GitHub | 2023 |
X-REM | - | + | Jeong et al. | GitHub | 2023 |
Visual Med-Alpaca | + | - | - | GitHub | 2023 |
CXR-RePaiR-Gen | - | + | Ranjit et al. | - | 2023 |
LLaVa-Med | + | - | Li et al. | GitHub | 2023 |
XrayGPT | + | + | Thawkar et al. | GitHub | 2023 |
CAT-ViL DeiT | + | - | Bai et al. | GitHub | 2023 |
MUMC | + | - | Li et al. | GitHub | 2023 |
Med-Flamingo | + | - | Moor et al. | GitHub | 2023 |
RaDialog | + | + | Pellegrini et al. | GitHub | 2023 |
PathChat | + | - | Lu et al. | GitHub | 2024 |
Medical Dataset | Image-Text pairs | QA pairs | Paper | Link |
---|---|---|---|---|
ROCO | + | - | Pelka et al. | GitHub |
MIMIC-CXR | + | - | Johnson et al. | PhysioNet |
MIMIC-CXR-JPG | + | - | Johnson et al. | PhysioNet |
MIMIC-NLE | + | - | Kayser et al. | GitHub |
CXR-PRO | + (unpaired) | - | Ramesh et al. | PhysioNet |
MS-CXR | + | - | Boecking et al. | PhysioNet |
IU-Xray or Open-I | + | - | Demner-Fushman et al. | Openi |
MedICaT | + | - | Subramanian et al. | GitHub |
PMC-OA | + | - | Lin et al. | Hugging Face |
SLAKE | - | + | Liu et al. | MedVQA |
VQA-RAD | - | + | Lau et al. | Osf |
PathVQA | - | + | He et al. | GitHub |
VQA-Med 2019 | - | + | Abacha et al. | GitHub |
VQA-Med 2020 | - | + | Abacha et al. | GitHub |
VQA-Med 2021 | - | + | Ionescu et al. | GitHub |
EndoVis 2017 | - | + | Allan et al. | GitHub |
EndoVis 2018 | - | + | Allan et al. | image frames in Challenge and the rest on GitHub |
PathQABench-Public | - | + | Lu et al. | GitHub |
@article{Hartsock2024,
title={Vision-language models for medical report generation and visual question answering: a review},
author={Hartsock, Iryna and Rasool, Ghulam},
journal={Frontiers in Artificial Intelligence},
volume={7},
pages={1430984},
year={2024},
publisher={Frontiers Media SA},
doi={10.3389/frai.2024.1430984},
url={https://www.frontiersin.org/articles/10.3389/frai.2024.1430984/full}
}