-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Instruction Tune with SFTTrainer? #426
Comments
Hi @jenkspt |
For example - the dataset from the falcon script 'timdettmers/openassistant-guanaco'. There are responses prefixed with |
@jenkspt |
Dolly does completion only: https://github.com/databrickslabs/dolly/blob/master/training/trainer.py#L48-L77 |
I see that makes sense, thanks a lot for the pointers! |
Alpaca indicates they are including input in about ~40% of their training data here. |
Hi everyone, Thanks all for your pointers, I made #445 that hopefully will be merged soon |
Hey @jenkspt, just saying hi :) It was great learning from your gpt jax implementation jenkspt/gpt-jax#2. Glad our paths crossed again. |
@vwxyzjn congrats on HuggingFace! |
With the
SFTTrainer
it's unclear to me how to instruction tune. I might be missing relevant details - but I the examples I've seen look like they are fine-tuning on the prompt and response rather than just the response.specifically looking at:
https://github.com/lvwerra/trl/blob/main/examples/stack_llama/scripts/supervised_finetuning.py
meanwhile alpaca code explicitly creates a supervised dataset to train on responses
https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
Are there any examples for instruction tuning with SFTTrainer or am I just missing something?
The text was updated successfully, but these errors were encountered: