-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low precision groupnorm #1976
Low precision groupnorm #1976
Conversation
Thoughs on combining this with lowprecisonlayernorm and calling that lowPrecisionNorms? |
I would prefer to keep separate because we don't have guarantees of convergence with this, and there might be a case where one applies but another does not. While we have never observed issues with this, it is possible there are problems given AMP does not do this by default |
We can allow turning on/off each type based on an option. |
Is this scriptable? Can you export a model with LowPrecisionGroupnorm using torch.jit.script? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no context on this PR, so I just went through and found ways to polish it.
composer/algorithms/low_precision_groupnorm/low_precision_groupnorm.py
Outdated
Show resolved
Hide resolved
…tel2000/composer into mvpatel2000/low-precision-groupnorm
Will switch to SimpleConvModel once #1991 merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments about the arguments and export. Once those are changed, I will approve.
composer/algorithms/low_precision_groupnorm/low_precision_groupnorm.py
Outdated
Show resolved
Hide resolved
composer/algorithms/low_precision_groupnorm/low_precision_groupnorm.py
Outdated
Show resolved
Hide resolved
I need a different logo... any suggestions for icon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically LGTM, except maybe one test needs to say "GroupNorm" instead of "LayerNorm"
What does this PR do?
Adds low precision groupnorm. This is modeled after low precision layernorm and is useful for stable diffusion. We've seen about +5% throughput and memory savings with this with no significant impact to loss curves.
What issue(s) does this change relate to?
CO-1794