A curated list of awesome Multimodal studies.
Title | Venue | Date | Code | Supplement |
---|---|---|---|---|
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | arXiv | 2024-06-18 | - | |
LOVA3: Learning to Visual Question Answering, Asking and Assessment | arXiv | 2024-05-23 | - | |
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI | arXiv | 2024-04-24 | ||
BLINK: Multimodal Large Language Models Can See but Not Perceive | arXiv | 2024-04-18 | ||
Ferret: Refer and Ground Anything Anywhere at Any Granularity (Ferret-Bench) | ICLR 2024 | 2023-10-11 | - | |
Aligning Large Multimodal Models with Factually Augmented RLHF (LLaVA-RLHF, MMHal-Bench (hallucination)) | arXiv | 2023-09-25 | ||
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations (AffectVisDial) | ECCV 2024 | 2023-08-30 | ||
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension | CVPR 2024 | 2023-07-30 | - |
Title | Venue | Date | Code | Supplement |
---|---|---|---|---|
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities | EMNLP 2023 (Findings) | 2023-05-18 |