[feat] save memory by using bucket buffer only in backward #633

min-xu-ai · 2021-04-27T01:20:59Z

this fixes bug FSDP memory utilization issue #627
added documentation to clarify the buffer's cost and speed/memory
tradeoff
added setup/teardown calls so that the buffer is only allocated
during the backward pass, saving more memory for forward and stepping
so that they can be used for things like activations.
added a unit test that assert the memory is in range.

Comparing with DDP:

buffer size scales with # of FSDP not model size
buffer is only allocated during backward
buffer is used for small tensors only to reduce overhead
overlapping of compute-reduction is very different

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?
Did you update the changelog? (if needed)

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

- this fixes bug #627 - added documentation to clarify the buffer's cost and speed/memory tradeoff - added setup/teardown calls so that the buffer is only allocated during the backward pass, saving more memory for forward and stepping so that they can be used for things like activations. - added a unit test that assert the memory is in range. Comparing with DDP: 1. buffer size scales with # of FSDP not model size 2. buffer is only allocated during backward 3. buffer is used for small tensors only to reduce overhead 4. overlapping of compute-reduction is very different

min-xu-ai · 2021-04-27T01:50:00Z

CHANGELOG.md

@@ -5,6 +5,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

 ## NEXT - TBD
+### Added
+- FSDP: better memory usage for reduce bucket ([#633](https://github.com/facebookresearch/fairscale/pull/633))


@blefaudeux, one thing tricky here is that when I was modifying the changelog file, the PR number wasn't available. I had to come back and edit this file again. :-)

min-xu-ai · 2021-04-27T01:50:33Z

fairscale/nn/data_parallel/fully_sharded_data_parallel.py

-            controls the bucket size in MegaBytes (MB). Buckets are sub-divided
-            based on world_size, so the max shard size is roughly
-            ``bucket_cap_mb / world_size``. Values <= 0 disable bucketing.
+            be more efficient for small tensors.


I try to document completely here. Please let me know if this is too much and/or inaccurate.

min-xu-ai · 2021-04-27T01:51:03Z

fairscale/utils/reduce_scatter_bucketer.py

@@ -37,6 +38,24 @@ def flush(self) -> None:
        self.callbacks.clear()
        self.output_shard = torch.zeros_like(self.data[0])

+    def setup(self) -> None:


This two functions enables some memory saving outside of backward pass.

tests/nn/data_parallel/test_fsdp_memory.py

myleott

Looks good, and nice test!

fairscale/nn/data_parallel/fully_sharded_data_parallel.py

tests/nn/data_parallel/test_fsdp_memory.py

fairscale/utils/reduce_scatter_bucketer.py

myleott · 2021-04-28T12:21:19Z

fairscale/utils/reduce_scatter_bucketer.py

+        # TODO (Min): the `group` used here in the key is the object hash, not the content
+        #     hash. That means if FSDP instances are initialized with different process groups,
+        #     even when the group members are in fact the same, we end up creating different
+        #     buckets here.


oh, I see, this is a good point!

sshleifer · 2021-04-30T04:23:16Z

Just read this. Nice catch and very educational docs!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2021

min-xu-ai marked this pull request as draft April 27, 2021 01:21

Min Xu added 2 commits April 26, 2021 18:22

add PR number to changelog

f562889

filled in with memory number on 1.9

4dce2d6

min-xu-ai mentioned this pull request Apr 27, 2021

FSDP memory utilization issue #627

Closed

4 tasks

min-xu-ai requested review from myleott, sshleifer, blefaudeux and anj-s April 27, 2021 01:48

min-xu-ai marked this pull request as ready for review April 27, 2021 01:49

min-xu-ai commented Apr 27, 2021

View reviewed changes

myleott approved these changes Apr 27, 2021

View reviewed changes

fairscale/nn/data_parallel/fully_sharded_data_parallel.py Outdated Show resolved Hide resolved

tests/nn/data_parallel/test_fsdp_memory.py Outdated Show resolved Hide resolved

myleott reviewed Apr 27, 2021

View reviewed changes

fairscale/utils/reduce_scatter_bucketer.py Show resolved Hide resolved

Min Xu added 5 commits April 27, 2021 16:02

Merge remote-tracking branch 'origin/master' into min/mem

14b29a9

addressed comments

a359af2

update comments

12afd63

fix for 1.6

2f1a7f0

add a todo

da4082c

min-xu-ai merged commit a559403 into master Apr 28, 2021

min-xu-ai deleted the min/mem branch April 28, 2021 00:46

myleott reviewed Apr 28, 2021

View reviewed changes

s-reaungamornrat mentioned this pull request Feb 18, 2024

FSDP on the same CNN model requires more memory than DataParallel #1163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] save memory by using bucket buffer only in backward #633

[feat] save memory by using bucket buffer only in backward #633

min-xu-ai commented Apr 27, 2021 •

edited

Loading

min-xu-ai Apr 27, 2021

min-xu-ai Apr 27, 2021

min-xu-ai Apr 27, 2021

myleott left a comment

myleott Apr 28, 2021

sshleifer commented Apr 30, 2021 •

edited

Loading

[feat] save memory by using bucket buffer only in backward #633

[feat] save memory by using bucket buffer only in backward #633

Conversation

min-xu-ai commented Apr 27, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

min-xu-ai Apr 27, 2021

Choose a reason for hiding this comment

min-xu-ai Apr 27, 2021

Choose a reason for hiding this comment

min-xu-ai Apr 27, 2021

Choose a reason for hiding this comment

myleott left a comment

Choose a reason for hiding this comment

myleott Apr 28, 2021

Choose a reason for hiding this comment

sshleifer commented Apr 30, 2021 • edited Loading

min-xu-ai commented Apr 27, 2021 •

edited

Loading

sshleifer commented Apr 30, 2021 •

edited

Loading