Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] add fsdp_tips.rst #455

Merged
merged 10 commits into from
Mar 8, 2021
Merged

[docs] add fsdp_tips.rst #455

merged 10 commits into from
Mar 8, 2021

Conversation

sshleifer
Copy link
Contributor

This is a dumping ground to collect things we want to document about FSDP.

Please comment with anything random that comes to mind.

Once there have been no changes to FSDP for 48 hrs (or some other proxy for stability), I will finalize this and set it to Ready For Review.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2021
@sshleifer sshleifer linked an issue Mar 1, 2021 that may be closed by this pull request



#### Misc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this include the new wrap/auto_wrap feature?

Copy link
Contributor Author

@sshleifer sshleifer Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes leave desired markdown in comment else I can take a pass tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#### The `enable_wrap` context

There are two cases where the `enable_wrap` context can be useful:

* When you'd like to apply the same parameters to all child modules that you wrap with FSDP. Calling the `wrap` function within the said context will save you from passing the same set of FSDP parameters explicitly.
* When wrapping large models that does NOT fit within the CPU memory. I.e. you don't first create the full model and then traverse it to wrap it with FSDP at different parts. Instead, you create a wrapped instance of the model incrementally as you build up the model, allowing large modules to be sharded in-place.

example:
        with enable_wrap(**fsdp_params):
            # Wraps layer in FSDP by default if within context
            self.l1 = wrap(torch.nn.Linear(5, 5))
            # Wraps children modules by default based on min_num_params
            self.l2 = auto_wrap(TransformerBlock(), min_num_params=1e8)

@min-xu-ai
Copy link
Contributor

One more thing to be added to the doc:

If model weight initialization happens deterministically after sharding, the final weights will be comprised of N piece of identical shards. This would have negative effects on the model initial state.

@myleott @sshleifer Does the above looks good? If so, I can also add it to the doctoring of FSDP. We generate doc from the doctoring.

@sshleifer sshleifer marked this pull request as ready for review March 5, 2021 15:49
@sshleifer
Copy link
Contributor Author

sshleifer commented Mar 5, 2021

These are not at all perfect, please feel free to push changes to this branch or comment.
I made a separate fsdp_tips.rst and cross-linked with fsdp.rst to avoid one crowding out the other on the same page. There may be a nicer way to do this in rst, (e.g. linking to section 2 at the top of 1 document).

I also copied the contents of the docstring of _init_param_attributes into a section called "State management with extra parameter attributes", which is not shown in the other page cause it's private and useful to know.

@sshleifer sshleifer changed the title [WIP] FSDP docs FSDP docs Mar 5, 2021
@sshleifer
Copy link
Contributor Author

sshleifer commented Mar 5, 2021

fsdp_tips.html.zip. Unzip and open in put the path in the chrome URL bar.

Copy link
Contributor

@Vittorio-Caggiano Vittorio-Caggiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small suggestion.
Would it possible to add also a small tutorial?

@myleott
Copy link
Contributor

myleott commented Mar 7, 2021

Would it possible to add also a small tutorial?

Yes, this is a great idea 😄

Just to cross-reference and not lose track, Min also suggested the tutorial should include custom weight init inside a summon_full_params context: #454 (comment)

Copy link
Contributor

@myleott myleott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @sshleifer! I made some comments below, but I think we can ship this and iterate. What do you think @min-xu-ai?

fairscale/nn/data_parallel/fully_sharded_data_parallel.py Outdated Show resolved Hide resolved
fairscale/nn/data_parallel/fully_sharded_data_parallel.py Outdated Show resolved Hide resolved
from fairscale.nn.auto_wrap import enable_wrap, auto_wrap
from fairscale.
fsdp_params = dict(mixed_precision=True, flatten_parameters=True)
with enable_wrap(**fsdp_params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new syntax is enable_wrap(wrapper_cls=FSDP, **fsdp_params)

docs/source/api/nn/fsdp_tips.rst Outdated Show resolved Hide resolved
docs/source/api/nn/fsdp_tips.rst Show resolved Hide resolved
docs/source/api/nn/fsdp_tips.rst Outdated Show resolved Hide resolved
docs/source/api/nn/fsdp_tips.rst Show resolved Hide resolved
docs/source/api/nn/fsdp_tips.rst Outdated Show resolved Hide resolved
@min-xu-ai
Copy link
Contributor

I think we can ship this and iterate. What do you think @min-xu-ai?

Yeah, totally.

@sshleifer sshleifer changed the title FSDP docs [docs] add fsdp_tips.rst Mar 8, 2021
@sshleifer sshleifer merged commit ad611a3 into master Mar 8, 2021
@sshleifer sshleifer deleted the fsdp-docs branch March 8, 2021 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FSDP Docs
5 participants