feature-steering

Here is 1 public repository matching this topic...

PaulPauls / llama3_interpretability_sae

A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.

pytorch feature-extraction open-research sparse-autoencoder llama3 llm-interpretability feature-steering

Updated Nov 29, 2024

Improve this page

Add a description, image, and links to the feature-steering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the feature-steering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature-steering

Here is 1 public repository matching this topic...

PaulPauls / llama3_interpretability_sae

Improve this page

Add this topic to your repo