Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revised demo testing to check all demos #542

Draft
wants to merge 48 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
bb8301f
revised demo testing to check all demos
bryce13950 Apr 15, 2024
4fba558
separated demos
bryce13950 Apr 23, 2024
310cb05
Merge branch 'main' into demos-test-coverage
bryce13950 Apr 24, 2024
e73f290
changed demo test order
bryce13950 Apr 24, 2024
99ba5db
rearranged test order
bryce13950 Apr 24, 2024
37dca9a
updated attribution patching to run differnt code in github
bryce13950 Apr 24, 2024
9397815
rearranged tests
bryce13950 Apr 24, 2024
430684e
updated header
bryce13950 Apr 24, 2024
316cb45
updated grokking demo
bryce13950 Apr 26, 2024
81a27f7
updated bert for testing
bryce13950 Apr 26, 2024
7d64be0
updated bert demo
bryce13950 Apr 26, 2024
027e2bc
ran cells
bryce13950 Apr 26, 2024
f06e2ec
removed github check
bryce13950 Apr 26, 2024
1f0197b
removed cells to skip
bryce13950 Apr 26, 2024
b1d5416
ignored output of loading cells
bryce13950 Apr 26, 2024
7f05523
Merge branch 'dev' into demos-test-coverage
bryce13950 May 2, 2024
576c089
changed notebook tests to run on separate jobs
bryce13950 May 2, 2024
5b63c46
added all notebooks to CI
bryce13950 May 2, 2024
efcdfa8
renamed file temporarily
bryce13950 May 2, 2024
35f74c2
fixed file case
bryce13950 May 2, 2024
eaba4b7
reorganized setup
bryce13950 May 2, 2024
9f6e4df
updated head detector demo
bryce13950 May 2, 2024
a4e4c90
reran othello
bryce13950 May 2, 2024
cda93e7
updated installation section
bryce13950 May 2, 2024
5ae2377
updated no position install to install deps in github
bryce13950 May 3, 2024
5a38266
Merge branch 'main' into demos-test-coverage
bryce13950 May 3, 2024
6f5e77c
updated output of beginning areas
bryce13950 May 3, 2024
ef93d5d
updated starting block for llama
bryce13950 May 3, 2024
bef4d99
regenerated no position experiment
bryce13950 May 3, 2024
610e6d8
Merge branch 'dev' into demos-test-coverage
bryce13950 May 7, 2024
8e4f120
removed llama gpu
bryce13950 May 7, 2024
fc7f90f
skipped llama
bryce13950 May 7, 2024
c648e50
ran ineractive neuroscope
bryce13950 May 8, 2024
d841820
Merge branch 'dev' into demos-test-coverage
bryce13950 May 11, 2024
8af686b
updated neuroscope diff areas
bryce13950 May 11, 2024
0ef80bb
removed cell output
bryce13950 May 11, 2024
f194a68
updated install steps
bryce13950 May 29, 2024
dce3c59
fixed import
bryce13950 May 29, 2024
6080b34
made sure to only run llama 1 block if the model is available
bryce13950 May 29, 2024
63635c8
added llama to tests again
bryce13950 May 30, 2024
df11d02
fixed deprecation message
bryce13950 May 30, 2024
335fa11
locked transformers version
bryce13950 May 30, 2024
c6506d2
fixed llama 2 demo
bryce13950 May 30, 2024
594f3ac
Merge branch 'dev' into demos-test-coverage
bryce13950 May 30, 2024
b0bf782
removed llama from ci
bryce13950 May 30, 2024
a028fab
removed llama 2 from ci
bryce13950 May 30, 2024
74db85c
turned off some demos in ci
bryce13950 May 30, 2024
fd302d8
removed activation patching
bryce13950 May 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ jobs:
# - "Attribution_Patching_Demo"
- "BERT"
- "Exploratory_Analysis_Demo"
# - "Grokking_Demo"
- "Grokking_Demo"
# - "Head_Detector_Demo"
- "Interactive_Neuroscope"
# - "LLaMA"
Expand Down
251 changes: 251 additions & 0 deletions demos/Config_Overhaul.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview\n",
"\n",
"The current way configuration is designed in TransformerLens has a lot of limitations. It does not\n",
"allow for outside people to pass through configurations that are not officially supported, and it\n",
"is very bug prone with something as simple as typo potentially giving you a massive headache. There\n",
"are also a number of hidden rules that are not clearly documented, which can go hidden until\n",
"different pieces of TransformerLens are activated. Allowing to pass in an optional object of configuration\n",
"with no further changes does solve a couple of these problems, but it does not solve the bigger\n",
"issues. It also introduces new problems with users potentially passing in architectures that are not\n",
"supported without having a clear way to inform the user what isn't supported.\n",
"\n",
"My proposal for how all of these problems can be resolved is to fundamentally revamp the\n",
"configuration to allow for something that I like to call configuration composition. From a technical\n",
"perspective, this involves creating a centralized class that describes all supported configurations\n",
"by TransformerLens. This class would then be used to construct specific configurations for all models\n",
"that are currently supported, and it would then allow anyone to easily see in a single place all\n",
"configuration features supported by TransformerLens while also being able to read the code to\n",
"understand how they can create their own configurations for the purpose of either submitting new\n",
"models into TransformerLens, or configuring an unofficially supported model by TransformerLens,\n",
"when TransformerLens already happens to support all of the architectural pieces separately.\n",
"\n",
"This could simple be an overhaul of the existing HookedTransformerConfig. Everything I am\n",
"describing here could be made compatible with that class to give it a more usable interface that is\n",
"then directly interacted with by the end user. At the moment, that class is not really built to be\n",
"interacted with, and is instead used as a wrapper around spreading configured anonymous objects.\n",
"Overhauling this class to do what I am about to describe is a viable path, but keeping it as it is,\n",
"and making a new class as something meant to be used by the end user would be a way to maintain\n",
"compatibility, avoid refactors, and keep model configuration only focused on putting together\n",
"configuration for models, as opposed to configuring full settings needed by HookedTransformer, which\n",
"includes checking the available environment.\n",
"\n",
"A very unscientific basic example of how this would look in code by the end user can be seen\n",
"immediately below. I will delve into details of each piece in this document."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"config = ModelConfig(\n",
" d_model=4096,\n",
" d_head=8192 // 64,\n",
" n_heads=64,\n",
" act_fn=\"silu\"\n",
" # Other universally required properties across all models go here in the constructor\n",
")\n",
"# Enabling specific features not universal among all models\n",
"config.enabled_gated_mlp()\n",
"# Customizing optional attributes\n",
"config.set_positional_embedding_type(\"alibi\")\n",
"\n",
"# and so on, until the full configuration is set\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The constructor\n",
"\n",
"The first piece of this I want to talk about is what will be injected into the constructor. It\n",
"should basically take everything absolutely required by all models. This keeps the code easy for\n",
"someone to understand, without adding too much clutter. All fields should be required, and if there\n",
"is ever an idea that a field should be in the constructor as an option, then that is probably an\n",
"indication that there is a good case to add a function to configure that variable in a different\n",
"point in the class. An example of what this would look like can be seen below..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make it easy for someone to see what activation functions are supported, this would be moved from\n",
"# HookedTransformerConfig\n",
"ActivationFunction = \"silu\" | \"gelu\"\n",
"\n",
"class ModelConfig:\n",
" def __init__(\n",
" self,\n",
" d_model: int,\n",
" eps: int,\n",
" act_fn: ActivationFunction,\n",
" remaining_required_attributes,\n",
" ):\n",
" self.d_model = d_model\n",
" self.eps = eps\n",
" self.act_fn = act_fn\n",
" # Set defaults for any remaining supported attributes that are not required here \n",
" self.gated_mlp = False\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Boolean Variables\n",
"\n",
"Within TransformerLens config, anything that is a boolean variable is essentially a feature flag.\n",
"This means that all features at the time of construction would have default values, most likely set\n",
"to false. They then get toggled on with an `enable_feature` function call on the config object.\n",
"Having these functions will make very clear for someone less familiar with TransformerLens what\n",
"features are available. It also allows us to decorate these calls, which is very important. There\n",
"are some instances where if a boolean is true, a different one cannot be true, but this requirement\n",
"is not clear anywhere without analyzing code. Decorating these functions allows us to make sure\n",
"these sort of bugs are not possible. I will use `gated_mlp` as an example here, but it is not\n",
"meant to be a real implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def enabled_gated_mlp(self: ModelConfig) -> ModelConfig:\n",
" self.gated_mlp = True\n",
" # Configure any side effects caused by enabling of a feature\n",
" self.another_feature = False\n",
" # Returning self allows someone to chain together config calls\n",
" return self\n",
"\n",
"ModelConfig.enabled_gated_mlp = enabled_gated_mlp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional Options\n",
"\n",
"Any other options would similarly have their own functions to configure. This allows for similar\n",
"decoration as with feature flags, and it also in a way documents the architectural capabilities of\n",
"TransformerLens in a single place. If there are groups of options that are also always required\n",
"together, this then gives us a way to require all of those options as opposed to having them all be\n",
"configured at the root level. This also allows us to make changes to other attributes that may be\n",
"affected as a side affect of having some values set, which again makes it both harder for people to\n",
"introduce bugs, and also creates code that documents itself. Another off the cuff example of\n",
"something like this can be seen below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def set_rotary_dim(self: ModelConfig, rotary_dim: int) -> ModelConfig:\n",
" self.rotary_dim = rotary_dim\n",
" # Additional settings that seem to be present whenever rotary_dim is set\n",
" self.positional_embedding_type = \"rotary\"\n",
" self.rotary_adjacent_pairs = False\n",
" return self\n",
"\n",
"ModelConfig.set_rotary_dim = set_rotary_dim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Config Final Thoughts\n",
"\n",
"The best way to describe this idea is configuration composition. The reason being is that the user is\n",
"essentially composing a model configuration by setting the base, and then combining various options\n",
"from predefined functions. Doing it like this has a lot of advantages. One of those advantages being\n",
"that there would need to be a lot less memorization on how architectures should be combined. e.g.\n",
"maybe it's not that hard to remember that `rotary_adjacent_pairs` should be False when `rotary_dim`\n",
"is set, but these sorts of combinations accumulate. Having it interfaced out gives everyone a\n",
"place to look to see how parts of configuration work in isolation without the need to memorize a\n",
"large amount of rules.\n",
"\n",
"This would also allow us to more easily mock out fake configurations and enable specific features in\n",
"order to test that functionality in isolation. This also should make it easier for someone to at a\n",
"glance understand all model compatibilities with TransformerLens, since there would be a single file\n",
"where they would all be listed out and documented. It will also allow for people to see\n",
"compatibility limitations at a glance.\n",
"\n",
"As for compatibility, this change would be 100% compatible with the existing structure. The objects\n",
"I am suggesting are abstractions of the existing configuration dictionaries for the purpose of\n",
"communication and ease of use. This means that they can be passed around just like the current\n",
"anonymous dictionaries.\n",
"\n",
"## Further Changes\n",
"\n",
"With this, there are a number of changes that I would like to make to the actual\n",
"`loading_from_pretrained` file in order to revise it to be ready for the possibility of rapidly\n",
"supporting new models. The biggest change in this respect would be to break out what is now a\n",
"configuration dictionary for every model into having its own module where one of these configuration\n",
"objects would be constructed. That object would then be exposed, so that it can be imported into\n",
"`loading_from_pretrained`. We would then create a dictionary where the official name of the\n",
"model would have the configuration object as its value, thus completely eliminating that big giant\n",
"if else statement, and replacing it with a simple return from the dictionary. The configurations\n",
"themselves would then live in a directory structure like so...\n",
"\n",
"config/ <- where the ModelConfig file lives\n",
"config/meta-llama/ <- directory for all models from the group\n",
"config/meta-llama/Llama-2-13b.py <- name matching hugging face to make it really easy to find the\n",
" configuration\n",
"\n",
"## Impact on Testing\n",
"\n",
"This change, would allow us to directly interact with these configuration objects to allow us to\n",
"more easily assert that configurations are set properly, and to also allow us to more easily access\n",
"these configurations in tests for the purposes of writing better unit tests. \n",
"\n",
"## Summary\n",
"\n",
"This change should solve a lot of problems. It may be a big change at first from what currently\n",
"exists, but in time I think most people will find it more elegant, and easier to understand. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
28 changes: 19 additions & 9 deletions demos/LLaMA.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@
"%pip install sentencepiece # Llama tokenizer requires sentencepiece\n",
"\n",
"if IN_COLAB or IN_GITHUB:\n",
" %pip install git+https://github.com/neelnanda-io/TransformerLens.git``\n",
" %pip install torch\n",
" %pip install transformer_lens\n",
" %pip install circuitsvis\n",
" \n",
"# Plotly needs a different renderer for VSCode/Notebooks vs Colab argh\n",
Expand Down Expand Up @@ -163,19 +164,28 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"MODEL_PATH=''\n",
"MODEL_PATH = \"\"\n",
"\n",
"tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)\n",
"hf_model = LlamaForCausalLM.from_pretrained(MODEL_PATH, low_cpu_mem_usage=True)\n",
"if MODEL_PATH:\n",
" tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)\n",
" hf_model = LlamaForCausalLM.from_pretrained(MODEL_PATH, low_cpu_mem_usage=True)\n",
"\n",
"model = HookedTransformer.from_pretrained(\"llama-7b\", hf_model=hf_model, device=\"cpu\", fold_ln=False, center_writing_weights=False, center_unembed=False, tokenizer=tokenizer)\n",
" model = HookedTransformer.from_pretrained(\n",
" \"llama-7b\",\n",
" hf_model=hf_model,\n",
" device=\"cpu\",\n",
" fold_ln=False,\n",
" center_writing_weights=False,\n",
" center_unembed=False,\n",
" tokenizer=tokenizer,\n",
" )\n",
"\n",
"model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"model.generate(\"The capital of Germany is\", max_new_tokens=20, temperature=0)"
" model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
" model.generate(\"The capital of Germany is\", max_new_tokens=20, temperature=0)"
]
},
{
Expand Down Expand Up @@ -441,7 +451,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.11.9"
},
"orig_nbformat": 4,
"vscode": {
Expand Down
22 changes: 11 additions & 11 deletions demos/LLaMA2_GPU_Quantized.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@
" ipython.magic(\"load_ext autoreload\")\n",
" ipython.magic(\"autoreload 2\")\n",
" \n",
"%pip install transformers>=4.31.0 # Llama requires transformers>=4.31.0 and transformers in turn requires Python 3.8\n",
"%pip install transformers==4.31.0 # Llama requires transformers>=4.31.0 and transformers in turn requires Python 3.8\n",
"%pip install sentencepiece # Llama tokenizer requires sentencepiece\n",
" \n",
"if IN_GITHUB or IN_COLAB:\n",
Expand Down Expand Up @@ -297,15 +297,16 @@
},
"outputs": [],
"source": [
"# MODEL_PATH=''\n",
"MODEL_PATH=''\n",
"\n",
"# tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)\n",
"# hf_model = LlamaForCausalLM.from_pretrained(MODEL_PATH, low_cpu_mem_usage=True)\n",
"if MODEL_PATH:\n",
" tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)\n",
" hf_model = LlamaForCausalLM.from_pretrained(MODEL_PATH, low_cpu_mem_usage=True)\n",
"\n",
"# model = HookedTransformer.from_pretrained(\"llama-7b\", hf_model=hf_model, device=\"cpu\", fold_ln=False, center_writing_weights=False, center_unembed=False, tokenizer=tokenizer)\n",
" model = HookedTransformer.from_pretrained(\"llama-7b\", hf_model=hf_model, device=\"cpu\", fold_ln=False, center_writing_weights=False, center_unembed=False, tokenizer=tokenizer)\n",
"\n",
"# model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"# model.generate(\"The capital of Germany is\", max_new_tokens=20, temperature=0)"
" model = model.to(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
" model.generate(\"The capital of Germany is\", max_new_tokens=20, temperature=0)"
]
},
{
Expand Down Expand Up @@ -390,7 +391,7 @@
}
],
"source": [
"%pip install bitsandbytes\n",
"%pip install bitsandbytes==0.42.0\n",
"%pip install accelerate"
]
},
Expand Down Expand Up @@ -715,8 +716,7 @@
],
"source": [
"\n",
"from transformers import AutoModelForCausalLM\n",
"from transformers import AutoTokenizer\n",
"from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
"\n",
"LLAMA_2_7B_CHAT_PATH = \"meta-llama/Llama-2-7b-chat-hf\"\n",
"inference_dtype = torch.float32\n",
Expand All @@ -726,7 +726,7 @@
"hf_model = AutoModelForCausalLM.from_pretrained(LLAMA_2_7B_CHAT_PATH,\n",
" torch_dtype=inference_dtype,\n",
" device_map = \"cuda:0\",\n",
" load_in_4bit=True)\n",
" quantization_config=BitsAndBytesConfig(load_in_4bit=True))\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(LLAMA_2_7B_CHAT_PATH)\n",
"\n",
Expand Down
Loading
Loading