[feat] FSDP: add auto_wrap_bn #531

min-xu-ai · 2021-03-18T04:33:16Z

add an utility function to handle wrapping of BN

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

- add an utility function to handle wrapping of BN

blefaudeux · 2021-03-18T15:24:27Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py

+            return not isinstance(module, tuple(default_auto_wrap_policy.FORCE_LEAF_MODULES))  # type: ignore
+        else:
+            return is_bn and not isinstance(module, tuple(default_auto_wrap_policy.EXCLUDE_WRAP_MODULES))  # type: ignore
+


nice, I find the next couple of lines (config, single group, then guided auto wrap) very elegant

Quick question: could dist.new_group(ranks=[my_rank]) impacts performance in any ways ?

no, should not really, AFAIK the overhead is minimal

I don't think there should be any perf impact since FSDP has special casing for world_size == 1. But perhaps @myleott can think of something else?

I understood the question being about the perf cost of having many groups in pytorch distributed basically, vs. few, not specific to FSDP. I might be wrong, but that was the reasoning behind my reply

blefaudeux · 2021-03-18T15:25:42Z

tests/nn/data_parallel/test_fsdp_regnet.py

@@ -54,7 +45,16 @@ def forward(self, x):
    # TODO (Min): check DDP equivalency.


having been burnt a little by that, I would recommend not waiting too long for that part

definitely. see my plan below.

blefaudeux

looks very good to me, I guess it's good that the others have a look though, I'm missing some context probably, but seems very clean and reasonable

myleott

+1 LGTM 😄

tchaton · 2021-03-18T15:50:45Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py

+            return not isinstance(module, tuple(default_auto_wrap_policy.FORCE_LEAF_MODULES))  # type: ignore
+        else:
+            return is_bn and not isinstance(module, tuple(default_auto_wrap_policy.EXCLUDE_WRAP_MODULES))  # type: ignore
+


Quick question: could dist.new_group(ranks=[my_rank]) impacts performance in any ways ?

min-xu-ai · 2021-03-18T16:14:09Z

Thank you guys @blefaudeux @myleott @tchaton for quick and high quality reviews. To forecast a bit:

after this is merged, I will make a new release which VISSL can depend on
I will make a vissl PR so that it can use the new flatten FSDP integration I have been preparing and testing for a while. That PR is to enable Lei, Giri and others to do more testings.
After that I am going to add DDP parity check here in the test. I plan to get a bit fancy here with pytest.fixture. There, we can run ddp once and reuse the results per test session for all test cases to check against.

[feat] FSDP: add auto_wrap_bn

3c0ddd5

- add an utility function to handle wrapping of BN

min-xu-ai requested a review from myleott March 18, 2021 04:33

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2021

min-xu-ai requested review from msbaines, anj-s and blefaudeux March 18, 2021 04:33

changelog

407d730

blefaudeux reviewed Mar 18, 2021

View reviewed changes

blefaudeux approved these changes Mar 18, 2021

View reviewed changes

myleott approved these changes Mar 18, 2021

View reviewed changes

tchaton approved these changes Mar 18, 2021

View reviewed changes

min-xu-ai merged commit 8b59267 into master Mar 18, 2021

min-xu-ai deleted the min/auto_wrap_bn branch March 18, 2021 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] FSDP: add auto_wrap_bn #531

[feat] FSDP: add auto_wrap_bn #531

min-xu-ai commented Mar 18, 2021

blefaudeux Mar 18, 2021

tchaton Mar 18, 2021

blefaudeux Mar 18, 2021

min-xu-ai Mar 18, 2021

blefaudeux Mar 18, 2021

blefaudeux Mar 18, 2021

min-xu-ai Mar 18, 2021

blefaudeux left a comment

myleott left a comment

tchaton Mar 18, 2021

min-xu-ai commented Mar 18, 2021

		@@ -54,7 +45,16 @@ def forward(self, x):
		# TODO (Min): check DDP equivalency.

[feat] FSDP: add auto_wrap_bn #531

[feat] FSDP: add auto_wrap_bn #531

Conversation

min-xu-ai commented Mar 18, 2021

Before submitting

What does this PR do?

PR review

Did you have fun?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blefaudeux left a comment

Choose a reason for hiding this comment

myleott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

min-xu-ai commented Mar 18, 2021