Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor EMA to improve memory efficiency #1941

Merged
merged 10 commits into from
Feb 9, 2023

Conversation

coryMosaicML
Copy link
Contributor

@coryMosaicML coryMosaicML commented Feb 3, 2023

What does this PR do?

This PR refactors composer's EMA algorithm. Changes:

  • Avoids an extra copy of the model parameters, reducing extra device memory from 2x -> 1x the size of the model parameters
  • Avoids duplicating non-trainable parameters which reduces memory required for models with many non trainable parameters.
  • Avoids a deepcopy of state.model after it has been DDP-wrapped

A downside: it is now a bit more annoying to access the training weights or ema weights directly from the algorithm object.

What issue(s) does this change relate to?

CO-1525

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coryMosaicML lgtm. Before approval though, would like to see plots / some kind of actual run verifying it works and gives same results as before. Can you please run something like that?

@coryMosaicML
Copy link
Contributor Author

coryMosaicML commented Feb 8, 2023

@coryMosaicML lgtm. Before approval though, would like to see plots / some kind of actual run verifying it works and gives same results as before. Can you please run something like that?

Here is a wandb report showing the memory reduction for stable diffusion, and a comparison with/without the EMA changes on our ResNet50 mild recipe.

@mvpatel2000 mvpatel2000 self-requested a review February 8, 2023 18:42
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once lint is fixed, feel free to merge

@mvpatel2000 mvpatel2000 merged commit 5be8642 into mosaicml:dev Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants