Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reward-bench for Reward Model #230

Open
lss11005 opened this issue Jul 6, 2024 · 1 comment
Open

reward-bench for Reward Model #230

lss11005 opened this issue Jul 6, 2024 · 1 comment

Comments

@lss11005
Copy link

lss11005 commented Jul 6, 2024

After train RM(step1-step3) with steerLM,I'll get reward model(.nemo), is it as the final reward model?

Nemotron-4-340B technical report show the perfermance of reward model based on reward-bench
Can you share the specific eval method by reward-bench, such as model convert step(nemo->hf) and parameter configuration during testing (chat_template, ...)

Can I replace base model to train reward model, such as Mistral-7b, which parameters should be modeified?

@berserkr
Copy link

berserkr commented Aug 7, 2024

I suspect once Nemo models are in Transformers, it will be easier to create a pipeline to run RB. In the meantime I have a simple hack: https://github.com/berserkr/NeMo-Aligner/blob/main/examples/nlp/gpt/nemo_bench.py :) You run it the same way you would run inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants