Update HRA #2160

DaShenZi721 · 2024-10-18T06:56:42Z

The order of matrix multiplication in HRA was modified, improving computational speed. The runtime of test_model_with_batchnorm_reproducibility in tests/test_vision_models.py is reduced to around 20 sec on an RTX 4090.
Add an entry and introduction for HRA in the documentation.

HuggingFaceDocBuilderDev · 2024-10-18T11:13:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-10-18T11:19:48Z

Ah, sorry for the failing CI, could you please merge with/rebase on main?

BenjaminBossan · 2024-10-18T13:51:29Z

Sorry, I haven't used rebase before, so I'm not exactly sure what I need to do. Could you explain it to me? Thanks!

No worries, let's just merge instead. So first ensure that you sync your fork with the upstream main branch (can be done via GitHub UI), then switch to your branch, merge the main branch into your branch, then push the changes.

DaShenZi721 · 2024-10-18T13:59:16Z

OK, I have merged the main branch huggingface:main into my branch. But I can't push... It says Everything up-to-date

BenjaminBossan · 2024-10-18T14:13:04Z

OK, I have merged the main branch huggingface:main into my branch. But I can't push... It says Everything up-to-date

You already pushed the merge successfully.

DaShenZi721 · 2024-10-19T02:37:23Z

OK. Thanks!

BenjaminBossan

Thanks for adjusting the calculation. I could confirm that this speeds up computation while retaining the same results. Also thanks for updating the docs, I just saw a small issue regarding LaTeX, please check.

BenjaminBossan · 2024-10-21T09:40:25Z

docs/source/conceptual_guides/adapter.md

+
+HRA constructs a chain of `r` trainable Householder reflections (HRs). Because the Householder reflection matrix is an orthogonal matrix and the product of orthogonal matrices is also an orthogonal matrix, HRA satisfies the theoretical guarantee of Orthogonal Finetuning (OFT). Meanwhile, HRA can also be viewed as an low-rank fine-tuning adapter by rewriting formula. 
+
+The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight $\lambda$ can control the strength of the regularizer. 


It looks like the LaTeX part is not correctly rendered: https://moon-ci-docs.huggingface.co/docs/peft/pr_2160/en/conceptual_guides/adapter#householder-reflection-adaptation-hra

I found this in the docs, maybe that helps.

OK, I have changed the LaTeX part's format.

BenjaminBossan

Thanks for providing this improvement for HRA and for extending the docs.

DaShenZi721 and others added 4 commits October 18, 2024 14:44

change multipication order and update docs

4286d38

Merge branch 'huggingface:main' into main

d77c7db

update docs

a657690

Merge branch 'main' of github.com:DaShenZi721/peft

1cb377b

Merge branch 'huggingface:main' into main

d00ecbf

BenjaminBossan requested changes Oct 21, 2024

View reviewed changes

DaShenZi721 added 2 commits October 21, 2024 21:33

update docs

37440b4

update docs

875ab72

BenjaminBossan approved these changes Oct 21, 2024

View reviewed changes

BenjaminBossan merged commit d5f4e6d into huggingface:main Oct 21, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update HRA #2160

Update HRA #2160

DaShenZi721 commented Oct 18, 2024

HuggingFaceDocBuilderDev commented Oct 18, 2024

BenjaminBossan commented Oct 18, 2024

BenjaminBossan commented Oct 18, 2024

DaShenZi721 commented Oct 18, 2024 •

edited

Loading

BenjaminBossan commented Oct 18, 2024

DaShenZi721 commented Oct 19, 2024

BenjaminBossan left a comment

BenjaminBossan Oct 21, 2024

DaShenZi721 Oct 21, 2024

BenjaminBossan left a comment


		HRA constructs a chain of `r` trainable Householder reflections (HRs). Because the Householder reflection matrix is an orthogonal matrix and the product of orthogonal matrices is also an orthogonal matrix, HRA satisfies the theoretical guarantee of Orthogonal Finetuning (OFT). Meanwhile, HRA can also be viewed as an low-rank fine-tuning adapter by rewriting formula.

		The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight $\lambda$ can control the strength of the regularizer.

Update HRA #2160

Update HRA #2160

Conversation

DaShenZi721 commented Oct 18, 2024

HuggingFaceDocBuilderDev commented Oct 18, 2024

BenjaminBossan commented Oct 18, 2024

BenjaminBossan commented Oct 18, 2024

DaShenZi721 commented Oct 18, 2024 • edited Loading

BenjaminBossan commented Oct 18, 2024

DaShenZi721 commented Oct 19, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Oct 21, 2024

Choose a reason for hiding this comment

DaShenZi721 Oct 21, 2024

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

DaShenZi721 commented Oct 18, 2024 •

edited

Loading