-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OLMo November 2024 #34551
Add OLMo November 2024 #34551
Conversation
I tested this before we locked in the |
Tests are passing, including slow ones (except for |
PR checks were passing before I merged main again, and PR check failures relate to other models. |
@ArthurZucker Gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks marvellous thanks for your hard work, let's get this merged asap! 🤗 Left very small comments it's great.
apologies again for the delay
|
||
## Overview | ||
|
||
The OLMo November 2024 model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these would need to be filled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new init should look more like thisL
# you may not use this file except in compliance with the License. |
Should make it simpler 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very very well down, the modular is super simple, makes it easy to identify differences!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than some difficulties getting started/finished and a bit less docs, modular was a nice experience!
convert_and_export_with_cache, | ||
) | ||
|
||
olmo_1124_model = "shanearora/OLMo-7B-1124-hf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the final checkpoint? 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just grabbed an intermediate checkpoint to use for the implementation. It's from pretty close to end of training.
We will upload the official final and intermediate checkpoints in an official HF Hub repo under the allenai
org. Now that this PR is approved, I think we can start the uploading.
generation_config=GenerationConfig( | ||
use_cache=True, | ||
cache_implementation=cache_implementation, | ||
max_length=max_generation_length, | ||
cache_config={ | ||
"batch_size": batch_size, | ||
"max_cache_len": max_generation_length, | ||
}, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's create it outside the call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5b7cad9 This was auto-generated from transformers-cli add-new-model-like
, but fixed anyways.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Something has gone wrong with the rebasing it seems 😓 |
We can merge once this is fixed! |
15a0fa6
to
796642d
Compare
796642d
to
82a75c2
Compare
I'm just going to add something for the model card (better than the blank state it is, we can change it later). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging! Thanks for the clean work 🤗
- RMSNorm is used instead of standard layer norm. | ||
- Norm is applied to attention queries and keys. | ||
- Norm is applied after attention/feedforward layers rather than before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💘 super clear, love this!
* Add model skeletion with transformers-cli add-new-model-like * Convert config to modular, add rms_norm_eps, delete clip_qkv * Convert model to modular, add RMSNorm * Add flash attention with qk norm and no qkv clipping * Add decoder layer with RMSNorm after attention/feedforward layers * Add base and causal model * Add converter improvements from OLMo repo * Update weight loading in OLMo to HF converter * Set correct default for rms_norm_eps * Set correct pipeline_model_mapping in test * Run make fixup * Fix model type * Re-run modular conversion * Manually set config docs to fix build errors * Convert olmo-1124 to olmo_1124 to fix flash attention docs errors * Start updating tests * Update tests * Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124 * Rename input_layernorm and post_attention_layernorm to reflect their ops better * Use correct tokenizer * Remove test unsupported by GPT2 tokenizer * Create GenerationConfig outside of from_pretrained call * Use simpler init file structure * Add explicit __all__ to support simplified init * Make safetensor serialization the default * Update OLMo November 2024 docs
* Add model skeletion with transformers-cli add-new-model-like * Convert config to modular, add rms_norm_eps, delete clip_qkv * Convert model to modular, add RMSNorm * Add flash attention with qk norm and no qkv clipping * Add decoder layer with RMSNorm after attention/feedforward layers * Add base and causal model * Add converter improvements from OLMo repo * Update weight loading in OLMo to HF converter * Set correct default for rms_norm_eps * Set correct pipeline_model_mapping in test * Run make fixup * Fix model type * Re-run modular conversion * Manually set config docs to fix build errors * Convert olmo-1124 to olmo_1124 to fix flash attention docs errors * Start updating tests * Update tests * Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124 * Rename input_layernorm and post_attention_layernorm to reflect their ops better * Use correct tokenizer * Remove test unsupported by GPT2 tokenizer * Create GenerationConfig outside of from_pretrained call * Use simpler init file structure * Add explicit __all__ to support simplified init * Make safetensor serialization the default * Update OLMo November 2024 docs
What does this PR do?
An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the existing model in transformers:
The original PR #34497 updated the OLMo implementation in transformers to support the November release. This PR instead adds a new model using the modular approach.
@ArthurZucker
Fixes #34496
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.