Regarding the Gemma2 Reward Model Structure #26

Loong435 · 2024-08-05T10:35:52Z

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?

WeiXiongUST · 2024-08-05T13:05:52Z

@YangRui2015 could you look into this?

YangRui2015 · 2024-08-05T16:17:43Z

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?

Hi, the model Ray2333/GRM-Gemma-2B-sftreg outputs only one value and does not follow the original AutoModelForSequenceClassification class. It seems you may not have loaded it correctly. Please refer to the example here for the correct loading procedure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the Gemma2 Reward Model Structure #26

Regarding the Gemma2 Reward Model Structure #26

Loong435 commented Aug 5, 2024

WeiXiongUST commented Aug 5, 2024

YangRui2015 commented Aug 5, 2024

Regarding the Gemma2 Reward Model Structure #26

Regarding the Gemma2 Reward Model Structure #26

Comments

Loong435 commented Aug 5, 2024

WeiXiongUST commented Aug 5, 2024

YangRui2015 commented Aug 5, 2024