Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment

Zheng Chen, Xun Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiongkuo Min, Xiaohong Liu, Xin Yuan, Yong Guo, and Yulun Zhang, "Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment", 2024

[project] [arXiv] [supplementary material] [dataset] [pretrained models]

🔥🔥🔥 News

2024-11-26: This repo is released.

Abstract: The development of multimodal large language models (MLLMs) enables the evaluation of image quality through natural language descriptions. This advancement allows for more detailed assessments. However, these MLLM-based IQA methods primarily rely on general contextual descriptions, sometimes limiting fine-grained quality assessment. To address this limitation, we introduce a new image quality assessment (IQA) task paradigm, grounding-IQA. This paradigm integrates multimodal referring and grounding with IQA to realize more fine-grained quality perception. Specifically, grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA). GIQA-DES involves detailed descriptions with precise locations (e.g., bounding boxes), while GIQA-VQA focuses on quality QA for local regions. To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline. Furthermore, we develop a well-designed benchmark, GIQA-Bench. The benchmark comprehensively evaluates the model grounding-IQA performance from three perspectives: description quality, VQA accuracy, and grounding precision. Experiments demonstrate that our proposed task paradigm, dataset, and benchmark facilitate the more fine-grained IQA application.

Pipeline

Radar Chat

🔖 TODO

🔗 Contents

Datasets
Models
Training
Testing
Results
Citation
Acknowledgements

📦 Datasets

🔎 Results

We achieve impressive performance on GIQA-DES and GIQA-VQA tasks.

Quantitative Results (click to expand)

Results in Tab. 5 of the main paper

Qualitative Results (click to expand)

Results in Fig. 7 of the main paper

More Qualitative Results

More Results on GIQA-DES (Fig. 6 of the supplementary material)

More Results on GIQA-VQA (Fig. 7 of the supplementary material)

📎 Citation

If you find the code helpful in your research or work, please cite the following paper(s).

@article{chen2024grounding,
  title={Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment},
  author={Chen, Zheng and Zhang, Xun and Li, Wenbo and Pei, Renjing and Song, Fenglong and Min, Xiongkuo and Liu, Xiaohong and Yuan, Xin and Guo, Yong and Zhang, Yulun},
  journal={arXiv preprint arXiv:2411.17237},
  year={2024}
}

💡 Acknowledgements

This project is based on Q-Instruct, DepictQA, mPLUG-Owl, and LLaVA.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
figs		figs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment

🔥🔥🔥 News

Pipeline

Radar Chat

🔖 TODO

🔗 Contents

📦 Datasets

🔎 Results

📎 Citation

💡 Acknowledgements

About

Releases 1

Packages

zhengchen1999/Grounding-IQA

Folders and files

Latest commit

History

Repository files navigation

Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment

🔥🔥🔥 News

Pipeline

Radar Chat

🔖 TODO

🔗 Contents

📦 Datasets

🔎 Results

📎 Citation

💡 Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Packages