Suggested DPO hyper-parameters #1003
Unanswered
Tejaswgupta
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've fine-tuned Qwen2.5 using Llama-factory , the results of the fine-tuned model aren't great and it's prone to wrong answers. What are the suggested parameters to do DPO/SimPO on Qwen. I found a bunch of resources of Mistral and Llama-3.1 but none for Qwen, not even in the technical report.
I'm trying to do hit and trial but any suggestions from anyone who's done it before is highly appreciated.
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions