Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hook_sae_acts_post for Gated models should be post-masking #322

Merged
merged 2 commits into from
Oct 7, 2024

Conversation

callummcdougall
Copy link
Contributor

@callummcdougall callummcdougall commented Oct 7, 2024

This changes the hook_sae_acts_post so that it applies to the gated activations after you've multiplied by the masking values.

This problem comes about because "output of encoder" and "output of nonlinear activation function on feature magnitudes" aren't the same thing. Long term solution is to have 2 different hook points, but as a quick fix, I think this hook point should be here (and I would appreciate a quick fix if possible, since it'll make the new ARENA material work when it get to the visualization section!).

@callummcdougall callummcdougall changed the title first commit hook_sae_acts_post for Gated models should be post-masking Oct 7, 2024
@chanind
Copy link
Collaborator

chanind commented Oct 7, 2024

Thanks for fixing this! We should really have test coverage on this stuff. I'll make issues to add tests to these functions

@chanind chanind merged commit 5e70edc into jbloomAus:main Oct 7, 2024
5 checks passed
@callummcdougall
Copy link
Contributor Author

np, thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants