-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor EMA to improve memory efficiency #1941
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coryMosaicML lgtm. Before approval though, would like to see plots / some kind of actual run verifying it works and gives same results as before. Can you please run something like that?
Here is a wandb report showing the memory reduction for stable diffusion, and a comparison with/without the EMA changes on our ResNet50 mild recipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Once lint is fixed, feel free to merge
2b507ac
to
08f1347
Compare
What does this PR do?
This PR refactors composer's EMA algorithm. Changes:
state.model
after it has been DDP-wrappedA downside: it is now a bit more annoying to access the training weights or ema weights directly from the algorithm object.
What issue(s) does this change relate to?
CO-1525