You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?
The text was updated successfully, but these errors were encountered:
I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?
Hi, the model Ray2333/GRM-Gemma-2B-sftreg outputs only one value and does not follow the original AutoModelForSequenceClassification class. It seems you may not have loaded it correctly. Please refer to the example here for the correct loading procedure.
I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?
The text was updated successfully, but these errors were encountered: