-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust HuggingFaceModel token embedding resizing to only occur when necessary #2027
Conversation
I think this was the issue a customer was running into. They had a checkpoint created by composer but couldn't load state_dict['state']['model'] into the HF model due to vocab size mismatch. IMO we shouldn't resize at all and should let it fail. We are silently changing the number of params in a model and that creates issues with loading the checkpoint outside of composer. |
Yup, that is correct @dskhudia. For the case where embedding vocab size is less then tokenizer vocab size, the training run will crash at some unknown point, with a nasty CUDA error. The biggest issue is that this could happen deep into training. I'd like to try to avoid users getting to that point. Does that sound reasonable to you? Or you would still like to just raise a warning and let the user deal with it? |
Or are you suggesting we raise an error here instead? |
I was suggesting to raise an error. |
For the first time ever, I think I might disagree with Daya :) But I am not usually thinking about the model's life post-composer. I agree that confusion can arise from having composer change the model parameters without telling you. It would make it difficult to correctly instantiate a HF model that could accept the trained weights outside composer. You'd have to construct the model with the vocab size that composer imposed, which could be easy to lose track of. Happily, that info should be packed right beside the actual weights in the composer checkpoint, though. Because Daniel has I disagree in that I think an error may be too restrictive. The error would put the same burden on the user (just earlier in the process) to always remember the correct vocab size in the very possible event that the default model config does not give you a vocab size that matches the tokenizer you want to use. Plus, it may prevent you from using pre-trained HF weights if you set a non-default vocab size in the model config. That could create hassles when working within composer. One use case that comes to mind is UL2R, where you may have to add special tokens to the vocab. In this case, it is necessary to get the pre-trained weights from HF and then use resize_token_embeddings to add tokens for the new vocab. Our To the extent that we should not let this happen without the user's permission, I would be in favor of adding a As a side note, I think we should also add similar support for using pre-trained weights fed into the Trainer's |
Thanks for the input! I'll go forward with the argument to control the behavior, which will default to erroring out rather than changing the model shape. And I made a JIRA to look into the ask about allowing composer to gracefully handle different shaped embeddings as well. |
ready for re-review @alextrott16 @dskhudia |
@alextrott16 I agree that it makes one user's (the one who is training) life easier at the expense of another (the one who has to prepare for serving). An example supporting your case though:
|
What does this PR do?
Previously,
HuggingFaceModel
automatically resizes the model vocab size to match whatever the tokenizer vocab size is. This can cause an issue when the model vocab size is rounded to a multiple of 8 or 64 and intentionally does not match the tokenizer vocab size. This PR changes our behavior to only resize the model vocab size when it is necessary, meaning when the model vocab size is less than the tokenizer vocab size. When the tokenizer vocab size is less than the model vocab size, we just raise a warning now.What issue(s) does this change relate to?
Closes CO-1861
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)