Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF FSDP wrap BLOOM and OPT as well #83

Merged
merged 12 commits into from
Jan 27, 2023
Merged

Conversation

samhavens
Copy link
Contributor

@samhavens samhavens commented Jan 24, 2023

This makes the FSDP wrap function a bit more generic so it will work on OPT and BLOOM models.

However, it removes an assert, so the user could now call it on any model — we may want some sanity check that they aren't calling it on e.g. T5, which this doesn't support yet (would need to look at model.encoder for some properties, I think). For now, I raise a ValueError if any of the expected parts of the model aren't found.

Also, I wasn't sure on how to correctly attribute the helper functions, see the file header.

@abhi-mosaic
Copy link
Contributor

abhi-mosaic commented Jan 24, 2023

Looking good so far! I think we should also consider replacing ComposerHFCausalLM(ComposerModel): with Composer's HuggingFaceModel class and see if it works/breaks. It gives us a lot of good functionality for free.

But also totally understand if it makes to separate into a followup PR.

@samhavens
Copy link
Contributor Author

samhavens commented Jan 24, 2023

Following up: this code works on my LegalGPT fork to FSDP-wrap OPT and BLOOM. I tested on 4x A100 40G, and OPT 6.7B and BLOOM 3B works. However, I got errors when using using it in this repo which I need to investigate.

I will update it to use HuggingFaceModel

@alextrott16
Copy link
Contributor

Love this. Simple and easy to follow/extend.

It would be good to confirm that it works on T5 (an Encoder-Decoder) and BERT (an Encoder-only). Along those lines, I might recommend explicitly defining the model types that this supports (with unsupported models triggering a warning or error) since there are a lot of HF model types out there and the 10 most popular ones will probably 99% of use cases.

llm/src/hf_causal_lm.py Outdated Show resolved Hide resolved
llm/src/hf_causal_lm.py Outdated Show resolved Hide resolved
@samhavens
Copy link
Contributor Author

@alextrott16 I agree that we should give information about confidence. There are 2 levels that I see: "we haven't checked this model yet, but it has all the properties that we expect in the helper functions" (likely will work?) vs "we tried to wrap this model, and were unable to find needed components" (almost certainly won't work)

@samhavens
Copy link
Contributor Author

@alextrott16 I agree that we should give information about confidence. There are 2 levels that I see: "we haven't checked this model yet, but it has all the properties that we expect in the helper functions" (likely will work?) vs "we tried to wrap this model, and were unable to find needed components" (almost certainly won't work)

I've implemented the later as a ValueError. Now need to test BERT-style & T5-style

@samhavens samhavens marked this pull request as ready for review January 27, 2023 02:34
@samhavens
Copy link
Contributor Author

@alextrott16 RE: model incompatibility concerns:

  • this seems to wrap Codegen (a decoder-only model I hadn't looked at).
  • the codebase errors if you try to pass T5, so didn't check that one (we call AutoModelForCausalLM, which substantially narrows down the potential models)
  • it raises a ValueError if you try to use Bert

So I think we are good on that front!

Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for untangling FSDP and expanding our supported models!

@abhi-mosaic abhi-mosaic merged commit f745d3a into mosaicml:main Jan 27, 2023
@samhavens samhavens deleted the hf-fsdp branch April 5, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants