-
Quick questions about truncation of sample inputs. Are the truncate opts per turn, or per episode (sample)? If this is per episode I also see one other curious thing i don't understand: I can't see these options as parameters for the train_model script in code or documentation, but following the pytorch code i find them there. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
Truncation defines what is sent to the agent -- so, it's per episode. BB3B and R2C2 models were pre-trained with different context sizes, hence the different truncation |
Beta Was this translation helpful? Give feedback.
Sorry, the terminology I used was a bit confusing. For
BB3B
, I am actually referring to BlenderBot 1.0, 3B parameter version. That model has 128 context length (see section 6.1 of the corresponding paper).You have a very large dataset. Generally, when we train with datasets where episodes contain > 1024 tokens of context, we simply truncate the older context. It is an open problem how to deal with extremely long context, as in your use case.
If you are looking at role-playing or staying in character, we offer a few other datasets in ParlAI that are dialogue-adjacent: