Refactor EMA to improve memory efficiency #1941

coryMosaicML · 2023-02-03T19:43:14Z

What does this PR do?

This PR refactors composer's EMA algorithm. Changes:

Avoids an extra copy of the model parameters, reducing extra device memory from 2x -> 1x the size of the model parameters
Avoids duplicating non-trainable parameters which reduces memory required for models with many non trainable parameters.
Avoids a deepcopy of state.model after it has been DDP-wrapped

A downside: it is now a bit more annoying to access the training weights or ema weights directly from the algorithm object.

What issue(s) does this change relate to?

CO-1525

mvpatel2000

@coryMosaicML lgtm. Before approval though, would like to see plots / some kind of actual run verifying it works and gives same results as before. Can you please run something like that?

coryMosaicML · 2023-02-08T18:33:34Z

@coryMosaicML lgtm. Before approval though, would like to see plots / some kind of actual run verifying it works and gives same results as before. Can you please run something like that?

Here is a wandb report showing the memory reduction for stable diffusion, and a comparison with/without the EMA changes on our ResNet50 mild recipe.

mvpatel2000

LGTM. Once lint is fixed, feel free to merge

coryMosaicML added 4 commits February 2, 2023 01:15

Initial switch to a more memory efficient implementation

32eead6

Update readme and add methods to extract ema/training weights

d08fd93

Some code cleanup

2d10917

Formatting

673e0d7

coryMosaicML requested review from dskhudia, mvpatel2000 and nik-mosaic as code owners February 3, 2023 19:43

coryMosaicML and others added 3 commits February 3, 2023 15:13

Merge branch 'dev' into memory-efficient-ema

5769a3f

Merge branch 'dev' into memory-efficient-ema

ed1efe6

Merge branch 'dev' into memory-efficient-ema

9f5ecaa

mvpatel2000 reviewed Feb 7, 2023

View reviewed changes

mvpatel2000 self-requested a review February 8, 2023 18:42

mvpatel2000 approved these changes Feb 8, 2023

View reviewed changes

Fix lint

08f1347

coryMosaicML force-pushed the memory-efficient-ema branch from 2b507ac to 08f1347 Compare February 8, 2023 21:17

coryMosaicML and others added 2 commits February 8, 2023 13:17

Merge branch 'dev' into memory-efficient-ema

fdbf56b

Merge branch 'dev' into memory-efficient-ema

7bfdba7

mvpatel2000 merged commit 5be8642 into mosaicml:dev Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor EMA to improve memory efficiency #1941

Refactor EMA to improve memory efficiency #1941

coryMosaicML commented Feb 3, 2023 •

edited by jira bot

Loading

mvpatel2000 left a comment

coryMosaicML commented Feb 8, 2023 •

edited

Loading

mvpatel2000 left a comment

Refactor EMA to improve memory efficiency #1941

Refactor EMA to improve memory efficiency #1941

Conversation

coryMosaicML commented Feb 3, 2023 • edited by jira bot Loading

What does this PR do?

What issue(s) does this change relate to?

mvpatel2000 left a comment

Choose a reason for hiding this comment

coryMosaicML commented Feb 8, 2023 • edited Loading

mvpatel2000 left a comment

Choose a reason for hiding this comment

coryMosaicML commented Feb 3, 2023 •

edited by jira bot

Loading

coryMosaicML commented Feb 8, 2023 •

edited

Loading