-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HF FSDP wrap BLOOM and OPT as well #83
Conversation
Looking good so far! I think we should also consider replacing But also totally understand if it makes to separate into a followup PR. |
Following up: this code works on my LegalGPT fork to FSDP-wrap OPT and BLOOM. I tested on 4x A100 40G, and OPT 6.7B and BLOOM 3B works. However, I got errors when using using it in this repo which I need to investigate. I will update it to use |
Love this. Simple and easy to follow/extend. It would be good to confirm that it works on T5 (an Encoder-Decoder) and BERT (an Encoder-only). Along those lines, I might recommend explicitly defining the model types that this supports (with unsupported models triggering a warning or error) since there are a lot of HF model types out there and the 10 most popular ones will probably 99% of use cases. |
@alextrott16 I agree that we should give information about confidence. There are 2 levels that I see: "we haven't checked this model yet, but it has all the properties that we expect in the helper functions" (likely will work?) vs "we tried to wrap this model, and were unable to find needed components" (almost certainly won't work) |
I've implemented the later as a |
@alextrott16 RE: model incompatibility concerns:
So I think we are good on that front! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for untangling FSDP and expanding our supported models!
This makes the FSDP wrap function a bit more generic so it will work on OPT and BLOOM models.
However, it removes an assert, so the user could now call it on any model — we may want some sanity check that they aren't calling it on e.g. T5, which this doesn't support yet (would need to look at
model.encoder
for some properties, I think). For now, I raise a ValueError if any of the expected parts of the model aren't found.Also, I wasn't sure on how to correctly attribute the helper functions, see the file header.