-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HRA #2160
Update HRA #2160
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Ah, sorry for the failing CI, could you please merge with/rebase on main? |
No worries, let's just merge instead. So first ensure that you sync your fork with the upstream main branch (can be done via GitHub UI), then switch to your branch, merge the main branch into your branch, then push the changes. |
OK, I have merged the main branch |
You already pushed the merge successfully. |
OK. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adjusting the calculation. I could confirm that this speeds up computation while retaining the same results. Also thanks for updating the docs, I just saw a small issue regarding LaTeX, please check.
|
||
HRA constructs a chain of `r` trainable Householder reflections (HRs). Because the Householder reflection matrix is an orthogonal matrix and the product of orthogonal matrices is also an orthogonal matrix, HRA satisfies the theoretical guarantee of Orthogonal Finetuning (OFT). Meanwhile, HRA can also be viewed as an low-rank fine-tuning adapter by rewriting formula. | ||
|
||
The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight $\lambda$ can control the strength of the regularizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the LaTeX part is not correctly rendered: https://moon-ci-docs.huggingface.co/docs/peft/pr_2160/en/conceptual_guides/adapter#householder-reflection-adaptation-hra
I found this in the docs, maybe that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have changed the LaTeX part's format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing this improvement for HRA and for extending the docs.
test_model_with_batchnorm_reproducibility
intests/test_vision_models.py
is reduced to around 20 sec on an RTX 4090.