You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py, for InternLM_Xcomposer_VL model all choices are added to model input and choice letters ("A.", "B.", "C.", "D.") are used as labels to calculate loss. While for all other models (instructblip, qwen_vl, llava_v2), in their interface code we can see only the question is added to model input, and the text of each choice is used as labels to calculate loss independently.
I wonder why do you use different input format for different models? Will this have large impact on accuracy?
The text was updated successfully, but these errors were encountered:
Thank you for your attention on our work. For qwen_vl, their office code uses ppl for A/B/C/D method to evaluate. For llava 1.5, their code uses generate method to evaluate. So I have modified corresponding code for ppl evaluation method. But for InternLM_Xcomposer_VL, their evaluation code is ppl evaluation method. Hence, I just provide InternLM_Xcomposer_VL evaluation code based on their code.
In
SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py
, for InternLM_Xcomposer_VL model all choices are added to model input and choice letters ("A.", "B.", "C.", "D.") are used as labels to calculate loss. While for all other models (instructblip, qwen_vl, llava_v2), in their interface code we can see only the question is added to model input, and the text of each choice is used as labels to calculate loss independently.I wonder why do you use different input format for different models? Will this have large impact on accuracy?
The text was updated successfully, but these errors were encountered: