-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RoPE Interpolation #3564
Add RoPE Interpolation #3564
Conversation
❌ pre-commit failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding the scaled-rope implementation! I left some minor comments.
@@ -779,3 +782,32 @@ debug: | |||
verbose: true | |||
num_train_epochs: 0.2 | |||
dtype: fp32 | |||
|
|||
patching-test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change the name to something like "rope_scaling_test"?
model/model_training/trainer_sft.py
Outdated
model = get_model(training_conf, tokenizer) | ||
|
||
from model_training.models.patching import RopePatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit of late import, otherwise I would recommend to move this to the other imports at the beginning of the file (to make it easier to see all main dependencies).
Added support for RopE interpolation via the SuperHOT method and its variants proposed in
reddit
scaled-rope
Supported methods
Supported Models
This can be easily extended and experimented with by configuring two parameters
superhot
andsuperhot_config