You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After train RM(step1-step3) with steerLM,I'll get reward model(.nemo), is it as the final reward model?
Nemotron-4-340B technical report show the perfermance of reward model based on reward-bench
Can you share the specific eval method by reward-bench, such as model convert step(nemo->hf) and parameter configuration during testing (chat_template, ...)
Can I replace base model to train reward model, such as Mistral-7b, which parameters should be modeified?
The text was updated successfully, but these errors were encountered:
After train RM(step1-step3) with steerLM,I'll get reward model(.nemo), is it as the final reward model?
Nemotron-4-340B technical report show the perfermance of reward model based on reward-bench
Can you share the specific eval method by reward-bench, such as model convert step(nemo->hf) and parameter configuration during testing (chat_template, ...)
Can I replace base model to train reward model, such as Mistral-7b, which parameters should be modeified?
The text was updated successfully, but these errors were encountered: