-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
论文中rm对比学习训练方法疑问 #45
Comments
我也同问,想知道只是把choose和reject两种响应做对比吗,那么所有choose response及其增强都互为正例,然后正例和所有reject response及其增强都互为负例? |
@Ablustrund 麻烦回答下? |
@Ablustrund 谢谢你的回答。 想知道一些这样具体建模的细节。另外就不太理解怎么直接对diff做对比学习。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi请问论文中关于对比学习两种方式具体是怎么实现的呢,他们在学习时分别的正负例各是什么?看了论文还是不太理解,特别是Preference Difference中的公式看起来就是简单转置了一下。
最后问一下相关代码什么时候会开源呢?谢谢
The text was updated successfully, but these errors were encountered: