Question about evaluation input format #27

yellow-binary-tree · 2024-05-11T13:59:47Z

In SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py, for InternLM_Xcomposer_VL model all choices are added to model input and choice letters ("A.", "B.", "C.", "D.") are used as labels to calculate loss. While for all other models (instructblip, qwen_vl, llava_v2), in their interface code we can see only the question is added to model input, and the text of each choice is used as labels to calculate loss independently.

I wonder why do you use different input format for different models? Will this have large impact on accuracy?

The text was updated successfully, but these errors were encountered:

Bohao-Lee · 2024-05-12T07:37:10Z

Thank you for your attention on our work. For qwen_vl, their office code uses ppl for A/B/C/D method to evaluate. For llava 1.5, their code uses generate method to evaluate. So I have modified corresponding code for ppl evaluation method. But for InternLM_Xcomposer_VL, their evaluation code is ppl evaluation method. Hence, I just provide InternLM_Xcomposer_VL evaluation code based on their code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about evaluation input format #27

Question about evaluation input format #27

yellow-binary-tree commented May 11, 2024

Bohao-Lee commented May 12, 2024

Question about evaluation input format #27

Question about evaluation input format #27

Comments

yellow-binary-tree commented May 11, 2024

Bohao-Lee commented May 12, 2024