MLLM support #142

ChingKwanCheung · 2024-08-02T01:24:27Z

Does this project support the training and inference of multi-modal retrieval models, such as Phi-3-vision? I'd like to reproduce the experiments in paper https://arxiv.org/abs/2406.11251 based on this project.

MXueguang · 2024-08-02T07:05:01Z

Thanks for your interest @ChingKwanCheung. I will merge the code and doc this weekend.

MXueguang · 2024-08-05T05:04:19Z

Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse

ChingKwanCheung · 2024-08-05T07:29:45Z

Hi @ChingKwanCheung, I have added the code and a initial doc in https://github.com/texttron/tevatron/tree/main/examples/dse

Thank you！This paper is a really good job. I have tested the multi-modal retrieval model(https://huggingface.co/Tevatron/dse-phi3-docmatix-v1) you released before and found that the English retrieval capability is excellent. If I want to enhance its Chinese retrieval capability, is it recommended to continue training with Chinese data based on this model?

MXueguang · 2024-08-12T07:55:12Z

Thanks @ChingKwanCheung , I guess the Chinese capability largely depends on the LLM's capability on Chinese and also how the Visual encoder aligns with the language model. I am not very sure if Phi3 do the things well on Chinese. I feel https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5 might be a good choice of backbone for Chinese tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLLM support #142

MLLM support #142

ChingKwanCheung commented Aug 2, 2024

MXueguang commented Aug 2, 2024

MXueguang commented Aug 5, 2024

ChingKwanCheung commented Aug 5, 2024

MXueguang commented Aug 12, 2024

MLLM support #142

MLLM support #142

Comments

ChingKwanCheung commented Aug 2, 2024

MXueguang commented Aug 2, 2024

MXueguang commented Aug 5, 2024

ChingKwanCheung commented Aug 5, 2024

MXueguang commented Aug 12, 2024