Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix activation cpu offloading #2724

Merged
merged 15 commits into from
Nov 22, 2023
Merged

Fix activation cpu offloading #2724

merged 15 commits into from
Nov 22, 2023

Conversation

cli99
Copy link
Contributor

@cli99 cli99 commented Nov 16, 2023

This PR fixes activation cpu offloading. The original implementation 1) breaks when offload_to_cpu is enabled and activation_checkpointing is disabled 2) does not offload to cpu when activation_checkpointing and offload_to_cpu are both enabled. See pytorch/pytorch#85459.

@cli99 cli99 requested review from vchiley and mvpatel2000 November 16, 2023 22:52
composer/trainer/dist_strategy.py Outdated Show resolved Hide resolved
composer/trainer/dist_strategy.py Outdated Show resolved Hide resolved
@cli99 cli99 requested a review from mvpatel2000 November 17, 2023 21:03
@cli99 cli99 requested a review from mvpatel2000 November 20, 2023 17:31
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test for this? Maybe just take a run in test_fsdp and add activation ckpting

@cli99 cli99 requested a review from mvpatel2000 November 22, 2023 00:20
@cli99
Copy link
Contributor Author

cli99 commented Nov 22, 2023

@mvpatel2000 added unit test and inline import

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move test to test_fsdp?

tests/trainer/test_fsdp_act_ckpt_offload.py Outdated Show resolved Hide resolved
@cli99 cli99 enabled auto-merge (squash) November 22, 2023 22:04
@cli99 cli99 merged commit 4dcbc2b into mosaicml:dev Nov 22, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants