Restore consistency of hook_normalized between LayerNorm and RMSNorm #770
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR intends to fix issue #747. hook_normalized is applied after the gain and bias weights are used in layer_norm.py, whereas in rms_norm.py it's before. This inconsistency was fixed by moving hook_normalized before the gain and bias weights in layer_norm.py. According to @neelnanda-io this will be a breaking change, which after @bryce13950 could be worked into release 3.0. I figured since there was already so much guidance as to how to go about the change in the issue itself and since the actual change was so small, I would just go ahead and do it. I would greatly appreciate some guidance about whether or not I should add tests for this, since I am new to mech Interp and this library and wouldn't know how to go about that.
Thanks for maintaining this library!
Fixes # (issue) #747
Type of change
Checklist: