`hook_sae_acts_post` for Gated models should be post-masking #322

callummcdougall · 2024-10-07T20:04:55Z

This changes the hook_sae_acts_post so that it applies to the gated activations after you've multiplied by the masking values.

This problem comes about because "output of encoder" and "output of nonlinear activation function on feature magnitudes" aren't the same thing. Long term solution is to have 2 different hook points, but as a quick fix, I think this hook point should be here (and I would appreciate a quick fix if possible, since it'll make the new ARENA material work when it get to the visualization section!).

chanind · 2024-10-07T21:54:34Z

Thanks for fixing this! We should really have test coverage on this stuff. I'll make issues to add tests to these functions

callummcdougall · 2024-10-08T07:56:23Z

np, thanks for merging!

first commit

e33c8e8

callummcdougall changed the title ~~first commit~~ hook_sae_acts_post for Gated models should be post-masking Oct 7, 2024

formatting

4deca23

chanind merged commit 5e70edc into jbloomAus:main Oct 7, 2024
5 checks passed

chanind mentioned this pull request Oct 7, 2024

[Proposal] Add test coverage to GatedSAE and JumpReLU implementations #323

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`hook_sae_acts_post` for Gated models should be post-masking #322

`hook_sae_acts_post` for Gated models should be post-masking #322

callummcdougall commented Oct 7, 2024 •

edited

Loading

chanind commented Oct 7, 2024

callummcdougall commented Oct 8, 2024

hook_sae_acts_post for Gated models should be post-masking #322

hook_sae_acts_post for Gated models should be post-masking #322

Conversation

callummcdougall commented Oct 7, 2024 • edited Loading

chanind commented Oct 7, 2024

callummcdougall commented Oct 8, 2024

`hook_sae_acts_post` for Gated models should be post-masking #322

`hook_sae_acts_post` for Gated models should be post-masking #322

callummcdougall commented Oct 7, 2024 •

edited

Loading