-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] add fsdp_tips.rst #455
Conversation
|
||
|
||
|
||
#### Misc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this include the new wrap/auto_wrap feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes leave desired markdown in comment else I can take a pass tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### The `enable_wrap` context
There are two cases where the `enable_wrap` context can be useful:
* When you'd like to apply the same parameters to all child modules that you wrap with FSDP. Calling the `wrap` function within the said context will save you from passing the same set of FSDP parameters explicitly.
* When wrapping large models that does NOT fit within the CPU memory. I.e. you don't first create the full model and then traverse it to wrap it with FSDP at different parts. Instead, you create a wrapped instance of the model incrementally as you build up the model, allowing large modules to be sharded in-place.
example:
with enable_wrap(**fsdp_params):
# Wraps layer in FSDP by default if within context
self.l1 = wrap(torch.nn.Linear(5, 5))
# Wraps children modules by default based on min_num_params
self.l2 = auto_wrap(TransformerBlock(), min_num_params=1e8)
One more thing to be added to the doc:
@myleott @sshleifer Does the above looks good? If so, I can also add it to the doctoring of FSDP. We generate doc from the doctoring. |
These are not at all perfect, please feel free to push changes to this branch or comment. I also copied the contents of the docstring of |
fsdp_tips.html.zip. Unzip and open in put the path in the chrome URL bar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a small suggestion.
Would it possible to add also a small tutorial?
Yes, this is a great idea 😄 Just to cross-reference and not lose track, Min also suggested the tutorial should include custom weight init inside a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @sshleifer! I made some comments below, but I think we can ship this and iterate. What do you think @min-xu-ai?
from fairscale.nn.auto_wrap import enable_wrap, auto_wrap | ||
from fairscale. | ||
fsdp_params = dict(mixed_precision=True, flatten_parameters=True) | ||
with enable_wrap(**fsdp_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new syntax is enable_wrap(wrapper_cls=FSDP, **fsdp_params)
Yeah, totally. |
This is a dumping ground to collect things we want to document about FSDP.
Please comment with anything random that comes to mind.
Once there have been no changes to FSDP for 48 hrs (or some other proxy for stability), I will finalize this and set it to Ready For Review.