🌾 OAT: Online AlignmenT for LLMs
thompson-sampling alignment distributed-training dueling-bandits dpo distributed-rl llm rlhf llm-aligment online-alignment llm-exploration
-
Updated
Dec 21, 2024 - Python