-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new merging methods #1364
Add new merging methods #1364
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! I left few open questions - what do you think?
for adapter, weight in zip(adapters, weights): | ||
if adapter in target.lora_A or adapter in target.lora_embedding_A: | ||
valid_adapters.append(adapter) | ||
valid_weights.append(weight) | ||
valid_weights.append(weight * target.scaling[adapter]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change somehow breaking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Younes, this should be the correct usage, earlier the implementation was incorrect as it was missing scaling factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits.
I really like how you have broken down the core merging methods in terms of small logical components that are shared across.
I think it definitely makes sense to also add a detailed guide about these methods on the PEFT doc site. But this can come later. Cc: @MKhalusova @stevhliu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
Do you think it'd also be nice to create a separate API reference page for these merging utilities so it's easy for users to find? I can work on this in a separate PR in addition to the guide suggested by @sayakpaul :)
Hi @pacman100 @younesbelkada @stevhliu @sayakpaul, I am Prateek Yadav, the author of TIES-Merging. Let me know if there is some way I can contribute to adding these merging methods to the Transformers or the PEFT library. Also, I agree with others here that having some sort of changes to the documentation or updates to the README can help the users to be aware of these merging methods, and how to use them. Let me know if there is some for me help! Thanks, |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Hi @pacman100, I went over the code for TIES and it seems good to me. I just mentioning the embedding layer thing might be important at least for merging full models. |
Hi @sayakpaul, I guess this is the pull request trying to document these merging methods |
Ah, you're correct. Thanks for the mention. |
Hi @sayakpaul @pacman100, is there a timeline on when this is expected to be merged? |
I think we can merge now. @younesbelkada WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @pacman100 for this excellent PR. Super well done and clean implementation + refactor!
In my review, I didn't check the correctness of the implementations in detail, since the original authors already did that and are much more qualified for this. Instead, I focused on the other parts, please take a look. Most things should be fairly easy adjustments.
Regarding the new functions in merge_utils.py
, I think it will be good to add some unit tests for those in a future PR.
src/peft/tuners/lora/model.py
Outdated
new_rank = adapters_ranks[0] | ||
elif combination_type == "cat": | ||
# adapters ranks may be different, new rank is sum of all ranks | ||
# be careful, because output adapter rank may be really big if mixing a lot of adapters | ||
new_rank = sum(adapters_ranks) | ||
elif combination_type == "svd": | ||
elif "svd" in combination_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer:
elif "svd" in combination_type: | |
elif (combination_type == "svd") or (combination_type == "ties_svd"): |
This is to prevent potential bugs in the future where we add a new combination type with a strange name that has "svd" as a substring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it is also meant for dare_linear_svd
, dare_ties_svd
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. So maybe let's check all these options explicitly, or use if combination_type.endswith("svd")
, which should be less error prone.
return sign == majority_sign | ||
|
||
|
||
def disjoint_merge(task_tensors: torch.Tensor, majority_sign_mask: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task_tensors
is not a list of tensors but a tensor, right? Below, task_tensors
is always a list of tensors. I wonder if it would make sense to use a different variable name for this function to avoid confusion.
Hi @pacman100 and @BenjaminBossan, I went into detail over the merge_utils file and left some comments, I feel like some parts need to be corrected and might have bugs if I am not missing something. I think I accidentally left many comments as opposed to reviewing but I guess that should be fine. |
Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the issues from my side. This LGTM now. One small comment left, but it's not a must.
src/peft/tuners/lora/model.py
Outdated
new_rank = adapters_ranks[0] | ||
elif combination_type == "cat": | ||
# adapters ranks may be different, new rank is sum of all ranks | ||
# be careful, because output adapter rank may be really big if mixing a lot of adapters | ||
new_rank = sum(adapters_ranks) | ||
elif combination_type == "svd": | ||
elif "svd" in combination_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. So maybe let's check all these options explicitly, or use if combination_type.endswith("svd")
, which should be less error prone.
Hi @pacman100, Thanks for working on this and merging this pull request. As we discussed earlier, I was wondering if this would be announced as a part of the next PEFT version or if should I just announce it on my Twitter right away. I feel like it would be much better to announce it with the next PEFT version, however, I am not sure when that would be. Moreover, the PR on adding the docs is still ongoing. Thanks, |
Hey Prateek! @pacman100 and I are working on a blog post to discuss this which we plan to release very soon. We will of course give everyone the due credits there. I think it makes sense to also do a release for this feature so that users don't have to install Does that work for you? |
Hi @sayakpaul, yes that sounds good to me looking forward to it. I will tweet about this once you do this release. Moreover, if you someone to proof read the blog post then I would be happy to help you guys with it. Prateek |
Hi @pacman100 and @sayakpaul, Le Yu |
Appreciate all the support. We will keep you posted. |
|
||
def ties( | ||
task_tensors: List[torch.Tensor], | ||
weights: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pacman100 @sayakpaul, I have just one last suggestion. It would be ideal to set the default weight for TIES to all ones because that's a setting we experimented with in our paper and kind of works well if people do not want to try out different values. You can look at the results with the red cross in this table, they are with default weights of all ones. This makes TIES as simple to use as basic averaging by avoiding the necessity to select the weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So by default TIES should have 20% of the parameters remaining and mixing weight = 1 for each peft module.
* add code * update docstring * quality * fix test * fix test * fix svd embedding layer merging * fixes * fixes * Update model.py * Add test and example * quality * fix tests * update the example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * address comments * address comments and add co-authors Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * quality * Update merge_utils.py * revert * address comments * address comment --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-authored-by: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
What does this PR do?
ties
,dare_linear
,dare_ties
,ties_svd
,dare_linear_svd
,dare_ties_svd
.Example of
ties_svd
is shown below (https://github.com/pacman100/peft-dreambooth-ui/blob/main/lora_merging.ipynb):LLM LoRA merging example:
To do: