Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RoPE Interpolation #3564

Merged
merged 12 commits into from
Jul 12, 2023
Merged

Add RoPE Interpolation #3564

merged 12 commits into from
Jul 12, 2023

Conversation

shahules786
Copy link
Collaborator

Added support for RopE interpolation via the SuperHOT method and its variants proposed in
reddit
scaled-rope

Supported methods

  • Linear scaling
  • NTK aware scaling
  • Dynamic NTK

Supported Models

  • LLAMA
  • Falcon

This can be easily extended and experimented with by configuring two parameters
superhot and superhot_config

@shahules786 shahules786 marked this pull request as ready for review July 11, 2023 18:37
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding the scaled-rope implementation! I left some minor comments.

@@ -779,3 +782,32 @@ debug:
verbose: true
num_train_epochs: 0.2
dtype: fp32

patching-test:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the name to something like "rope_scaling_test"?

model/model_training/models/rope.py Show resolved Hide resolved
model/model_training/trainer_sft.py Outdated Show resolved Hide resolved
model = get_model(training_conf, tokenizer)

from model_training.models.patching import RopePatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit of late import, otherwise I would recommend to move this to the other imports at the beginning of the file (to make it easier to see all main dependencies).

model/model_training/configs/config.yaml Outdated Show resolved Hide resolved
@shahules786 shahules786 requested a review from andreaskoepf July 12, 2023 11:15
@andreaskoepf andreaskoepf merged commit 018657b into LAION-AI:main Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants