Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
Hongzhe Huang1, Zhewen Yu1, Jiang Liu1, Li Cai1, Dian Jiao1, Wenqiao Zhang1†, Siliang Tang1,
Juncheng Li1, Hao Jiang2, Haoyuan Li2, Yueting Zhuang1
1Zhejiang University, 2Alibaba
†Corresponding Authors
Align2LLaVA is a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment, to compress the vast corpus of machine-generated multimodal instructions to a compact and high-quality form.
Code will be available soon.
If you find this work useful, please consider giving this repository a star and citing our paper as follows:
@misc{huang2024align2llavacascadedhumanlarge,
title={Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation},
author={Hongzhe Huang and Zhewen Yu and Jiang Liu and Li Cai and Dian Jiao and Wenqiao Zhang and Siliang Tang and Juncheng Li and Hao Jiang and Haoyuan Li and Yueting Zhuang},
year={2024},
eprint={2409.18541},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2409.18541},
}