Add OLMo November 2024 #34551

2015aroras · 2024-10-31T23:40:39Z

What does this PR do?

An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the existing model in transformers:

RMSNorm is used instead of standard layer norm.
Norm is applied to attention queries and keys.
Norm is applied after attention/feedforward rather than before.

The original PR #34497 updated the OLMo implementation in transformers to support the November release. This PR instead adds a new model using the modular approach.

@ArthurZucker

Fixes #34496

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

2015aroras · 2024-10-31T23:48:33Z

I tested this before we locked in the Olmo1124 naming conventions. Will update once tested.

2015aroras · 2024-11-04T23:44:32Z

Tests are passing, including slow ones (except for Olmo1124ModelTest::test_generate_compile_1_end_to_end, but this appears to be broken for base OLMo too so I'm considering it an existing problem).. I've used a test HF hub repo (shanearora/OLMo-7B-1124-hf) since the official final model is not ready yet.

2015aroras · 2024-11-05T00:01:58Z

PR checks were passing before I merged main again, and PR check failures relate to other models.

2015aroras · 2024-11-06T15:25:49Z

@ArthurZucker Gentle ping

ArthurZucker

Looks marvellous thanks for your hard work, let's get this merged asap! 🤗 Left very small comments it's great.
apologies again for the delay

ArthurZucker · 2024-11-14T01:24:25Z

docs/source/en/model_doc/olmo_1124.md

+
+## Overview
+
+The OLMo November 2024 model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.


these would need to be filled

ArthurZucker · 2024-11-14T01:26:01Z

src/transformers/models/olmo_1124/__init__.py

new init should look more like thisL

transformers/src/transformers/models/albert/__init__.py

Line 4 in aca9120

# you may not use this file except in compliance with the License.

Should make it simpler 🤗

f29dc50 Done, though it took me a while to debug why it wasn't working. The simplified init requires __all__ to be explicitly set: 3960e35.

ArthurZucker · 2024-11-14T01:29:05Z

src/transformers/models/olmo_1124/modular_olmo_1124.py

Very very well down, the modular is super simple, makes it easy to identify differences!

Other than some difficulties getting started/finished and a bit less docs, modular was a nice experience!

ArthurZucker · 2024-11-14T01:30:06Z

tests/models/olmo_1124/test_modeling_olmo_1124.py

+            convert_and_export_with_cache,
+        )
+
+        olmo_1124_model = "shanearora/OLMo-7B-1124-hf"


is this the final checkpoint? 🤗

No, I just grabbed an intermediate checkpoint to use for the implementation. It's from pretty close to end of training.

We will upload the official final and intermediate checkpoints in an official HF Hub repo under the allenai org. Now that this PR is approved, I think we can start the uploading.

ArthurZucker · 2024-11-14T01:30:17Z

tests/models/olmo_1124/test_modeling_olmo_1124.py

+            generation_config=GenerationConfig(
+                use_cache=True,
+                cache_implementation=cache_implementation,
+                max_length=max_generation_length,
+                cache_config={
+                    "batch_size": batch_size,
+                    "max_cache_len": max_generation_length,
+                },
+            ),


let's create it outside the call!

5b7cad9 This was auto-generated from transformers-cli add-new-model-like, but fixed anyways.

HuggingFaceDocBuilderDev · 2024-11-14T01:57:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-11-15T21:53:07Z

Something has gone wrong with the rebasing it seems 😓

ArthurZucker · 2024-11-15T21:53:40Z

We can merge once this is fixed!

… olmo_1124

…ops better

2015aroras · 2024-11-15T22:16:45Z

I'm just going to add something for the model card (better than the blank state it is, we can change it later).

ArthurZucker

Merging! Thanks for the clean work 🤗

ArthurZucker · 2024-11-18T08:17:01Z

docs/source/en/model_doc/olmo_1124.md

+- RMSNorm is used instead of standard layer norm.
+- Norm is applied to attention queries and keys.
+- Norm is applied after attention/feedforward layers rather than before.


💘 super clear, love this!

* Add model skeletion with transformers-cli add-new-model-like * Convert config to modular, add rms_norm_eps, delete clip_qkv * Convert model to modular, add RMSNorm * Add flash attention with qk norm and no qkv clipping * Add decoder layer with RMSNorm after attention/feedforward layers * Add base and causal model * Add converter improvements from OLMo repo * Update weight loading in OLMo to HF converter * Set correct default for rms_norm_eps * Set correct pipeline_model_mapping in test * Run make fixup * Fix model type * Re-run modular conversion * Manually set config docs to fix build errors * Convert olmo-1124 to olmo_1124 to fix flash attention docs errors * Start updating tests * Update tests * Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124 * Rename input_layernorm and post_attention_layernorm to reflect their ops better * Use correct tokenizer * Remove test unsupported by GPT2 tokenizer * Create GenerationConfig outside of from_pretrained call * Use simpler init file structure * Add explicit __all__ to support simplified init * Make safetensor serialization the default * Update OLMo November 2024 docs

2015aroras marked this pull request as draft October 31, 2024 23:48

2015aroras marked this pull request as ready for review November 4, 2024 23:42

ArthurZucker approved these changes Nov 14, 2024

View reviewed changes

2015aroras mentioned this pull request Nov 15, 2024

Feature Request: Add OLMo November 2024 ggerganov/llama.cpp#10316

Closed

4 tasks

2015aroras force-pushed the shanea/add-olmo1124 branch from 15a0fa6 to 796642d Compare November 15, 2024 22:13

2015aroras added 18 commits November 15, 2024 14:14

Add model skeletion with transformers-cli add-new-model-like

6e747c2

Convert config to modular, add rms_norm_eps, delete clip_qkv

a80ffd1

Convert model to modular, add RMSNorm

ffa794e

Add flash attention with qk norm and no qkv clipping

75d38f0

Add decoder layer with RMSNorm after attention/feedforward layers

dbd880d

Add base and causal model

06c9c44

Add converter improvements from OLMo repo

b73f6d3

Update weight loading in OLMo to HF converter

c8d9411

Set correct default for rms_norm_eps

4e3da14

Set correct pipeline_model_mapping in test

87d54bb

Run make fixup

b7939d2

Fix model type

d39587f

Re-run modular conversion

30c20f6

Manually set config docs to fix build errors

cdce157

Convert olmo-1124 to olmo_1124 to fix flash attention docs errors

3a9c61c

Start updating tests

949648e

Update tests

0217f40

Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to…

1bdaa05

… olmo_1124

2015aroras added 7 commits November 15, 2024 14:14

Rename input_layernorm and post_attention_layernorm to reflect their …

0b1f2bf

…ops better

Use correct tokenizer

9e7c77d

Remove test unsupported by GPT2 tokenizer

11f67eb

Create GenerationConfig outside of from_pretrained call

0c2a264

Use simpler init file structure

a22d936

Add explicit __all__ to support simplified init

a3cca57

Make safetensor serialization the default

82a75c2

2015aroras force-pushed the shanea/add-olmo1124 branch from 796642d to 82a75c2 Compare November 15, 2024 22:14

Update OLMo November 2024 docs

bfd2e63

ArthurZucker approved these changes Nov 18, 2024

View reviewed changes

ArthurZucker merged commit 3ee24e2 into huggingface:main Nov 18, 2024
22 of 26 checks passed

2015aroras mentioned this pull request Nov 20, 2024

[Model] Add OLMo November 2024 model vllm-project/vllm#10503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OLMo November 2024 #34551

Add OLMo November 2024 #34551

2015aroras commented Oct 31, 2024 •

edited by gabrielmbmb

Loading

2015aroras commented Oct 31, 2024

2015aroras commented Nov 4, 2024 •

edited

Loading

2015aroras commented Nov 5, 2024

2015aroras commented Nov 6, 2024

ArthurZucker left a comment

ArthurZucker Nov 14, 2024

ArthurZucker Nov 14, 2024

2015aroras Nov 14, 2024

ArthurZucker Nov 14, 2024

2015aroras Nov 14, 2024

ArthurZucker Nov 14, 2024

2015aroras Nov 14, 2024

ArthurZucker Nov 14, 2024

2015aroras Nov 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 14, 2024

ArthurZucker commented Nov 15, 2024

ArthurZucker commented Nov 15, 2024

2015aroras commented Nov 15, 2024

ArthurZucker left a comment

ArthurZucker Nov 18, 2024


		## Overview

		The OLMo November 2024 model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.

Add OLMo November 2024 #34551

Add OLMo November 2024 #34551

Conversation

2015aroras commented Oct 31, 2024 • edited by gabrielmbmb Loading

What does this PR do?

Before submitting

Who can review?

2015aroras commented Oct 31, 2024

2015aroras commented Nov 4, 2024 • edited Loading

2015aroras commented Nov 5, 2024

2015aroras commented Nov 6, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2015aroras Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 14, 2024

ArthurZucker commented Nov 15, 2024

ArthurZucker commented Nov 15, 2024

2015aroras commented Nov 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2015aroras commented Oct 31, 2024 •

edited by gabrielmbmb

Loading

2015aroras commented Nov 4, 2024 •

edited

Loading

2015aroras Nov 14, 2024 •

edited

Loading