Fix fsdp weight tying #1856

bcui19 · 2023-01-05T18:25:10Z

What does this PR do?

When initializing FSDP with device='meta' it undoes weight tying. This is a known issue in PyTorch with deferred initialization. Additionally, in order to address this, all weight tied modules have to be in the same FSDP module, as a result we try our best to force the FSDP parameters into the same module.

What issue(s) does this change relate to?

CO-1511

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

dakinggg

LGTM as far as I can tell, will let abhi or other approve. Also, have you tested with the examples repo (without Vitaliy's recent fix)? I'd like to know two things 1) Does it properly respect the tied weights there? and 2) Does it change the memory/throughput?

tests/common/models.py

composer/trainer/dist_strategy.py

bcui19 · 2023-01-09T20:02:45Z

Does it properly respect the tied weights there? and 2) Does it change the memory/throughput?

Yes so long as the tied modules end up in the same FSDP module, which can be subject to 'min_params'
I checked the base memory usage of each model (for 125m params), without the fix it uses 2.69GB, with the fix it uses 2.23GB. Throughput is mildly affected https://wandb.ai/mosaic-ml/meta-tensors-python?workspace=user-bcui (< 1% change)

vchiley

Left a comment, but it generally looks good.

Also whats the point of file composer/scratch?

composer/trainer/dist_strategy.py

bcui19 added 6 commits December 15, 2022 18:10

Redoing weight tying with FSDP

6301da8

Adding in custom safe apply for modules

4cf67ed

Adding in a warning on FSDP modules with weight tying

755d9e9

merge

15d8cee

adding tests for fsdp weight tying and initialization

e8bb34f

Removing extra code

6211de6

bcui19 requested review from vchiley, dakinggg and abhi-mosaic January 5, 2023 18:25

dakinggg reviewed Jan 6, 2023

View reviewed changes

bcui19 added 2 commits January 6, 2023 20:54

Resolving comments, cleaning up a bit of code

abbc7f3

Adding in qualifications for fsdp meta tensor tests

11d1a81

vchiley approved these changes Jan 13, 2023

View reviewed changes

composer/trainer/dist_strategy.py Outdated Show resolved Hide resolved

bcui19 added 2 commits January 13, 2023 19:19

Removing extraneous file

c31ca4f

Cleaning up python attributes

c24c1b5

bcui19 merged commit 88356e3 into mosaicml:dev Jan 13, 2023

bcui19 deleted the fix_fsdp_weight_tying branch March 10, 2023 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fsdp weight tying #1856

Fix fsdp weight tying #1856

bcui19 commented Jan 5, 2023 •

edited by jira bot

Loading

dakinggg left a comment

bcui19 commented Jan 9, 2023

vchiley left a comment

Fix fsdp weight tying #1856

Fix fsdp weight tying #1856

Conversation

bcui19 commented Jan 5, 2023 • edited by jira bot Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

dakinggg left a comment

Choose a reason for hiding this comment

bcui19 commented Jan 9, 2023

vchiley left a comment

Choose a reason for hiding this comment

bcui19 commented Jan 5, 2023 •

edited by jira bot

Loading