-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow LPLayerNorm
and LPGroupNorm
to support self.bias
or self.weight
= None
#2044
Conversation
@bandish-shah there is a moderate likelihood that our MosaicGPT in Basically this might be worth an 0.13.2, if you have other stuff you want to get in there. |
@abhi-mosaic Can you please add a unit test ensuring this works? Similarly, can you please add the same fix and unit test to
We do have a lot of checkpointing fixes coming in this week.... |
@abhi-mosaic let's hold off, there are other fixes in progress. We can target the 0.13.2 hot patch for late next week. |
composer/algorithms/low_precision_layernorm/low_precision_layernorm.py
Outdated
Show resolved
Hide resolved
…rnorm.py Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
LPLayerNorm
to support self.bias=None
LPLayerNorm
to support self.bias
or self.weight
=None
LPLayerNorm
to support self.bias
or self.weight
=NoneLPLayerNorm
to support self.bias
or self.weight
= None
LPLayerNorm
to support self.bias
or self.weight
= NoneLPLayerNorm
and LPGroupNorm
to support self.bias
or self.weight
= None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for finishing this
Thanks @nik-mosaic !!! |
….weight` = None (#2044) Extends support to affine=False and bias-free models. --------- Co-authored-by: nik-mosaic <101217697+nik-mosaic@users.noreply.github.com> Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
We are experimenting with removing all biases from our MosaicGPT models. When we do so with
torch.nn.LayerNorm
it works, but withLPLayerNorm
it fails. This PR:(1) Modifies the weight copying from torch.nn.LayerNorm to LPLayerNorm that occurs during module surgery to check for None types.
(2) Adds a check in the forward() method
if self.bias is None
orself.weight is None
, and doesn't downcast the None parameters.(3) Updates a test to run on a model where some LayerNorms have both weights and biases, some have self.bias = None, and some have self.bias = None and self.weight = None.
This PR also does (1), (2), and (3) for LPGroupNorm.