-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for "leaf Variable that requires grad" Error in In-Place Operation #1372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the fix sounds good !
Can you propagate the fix to other LoRA types (Conv, etc.), also can you share a small reproducible snippet of the bug?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@younesbelka I tried CASUAL_LM and mixtral model..! Also my target modules includes Embedding layer, so this code revision was implemented in Embedding class in LoRA layer. So my PR is only related in Embedding class. |
@younesbelkada also I changed lora/bnb.py for fix the error which written in below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @DopeorNope-Lee for the fixes, makes sense. I agree with Younes that it would be better to make this change across all LoRA layers. However, that could be part of separate PR in case the issue is only noticed for the layers changed in this PR.
@pacman100 I will fix all of the layers and will mention again..! Thanks for your recommendation..! |
Thanks @DopeorNope-Lee ! Let us know if you need any help |
Thanks @DopeorNope-Lee for providing this PR. Do you have a minimal code example that demonstrates the error with the in-place operation? Edit: Is this related to #1425? |
Thanks @BenjaminBossan I saw the PR in your mention. But it have some difference. First, above the codes are related to full-finetuning(someone say it continuous pre-train). However, in place errors occurs in fine-tuning cases. So I revised all of @younesbelkada Hi, I revised all of operators in Lora layers. However I saw 'conflicts that must be resolved'. I think the previous code used copy method, but the recent version uses clone method. So I also updated it! If there is more improvement or other issues, let me know I will follow up. |
Thanks for explaining. I wrote a small test to check full fine-tuning and didn't encounter any error when training. Could you please provide a minimal example? This is also important to have as a unit test so that we can prevent regressions in the future. |
@BenjaminBossan Did you add embed_tokens in target module? Detail information is follow:
|
I added this test to @parameterized.expand(TEST_CASES)
def test_training_full_finetuning(self, test_name, model_id, config_cls, config_kwargs):
# check if training the full model works, no error is raised; only check custom models since they are small
# so full finetuning shouldn't take too long
model = self.transformers_class.from_pretrained(model_id)
config = config_cls(
base_model_name_or_path=model_id,
**config_kwargs,
)
model = get_peft_model(model, config)
model = model.to(self.torch_device)
model.train()
model.requires_grad_(True) # make all parameters trainable
optim = torch.optim.SGD(model.parameters(), lr=0.1)
inputs = self.prepare_inputs_for_testing()
for _ in range(5):
optim.zero_grad()
output = model(**inputs)[0]
loss = output.sum()
loss.backward()
optim.step() These tests include examples with embedding layers in the |
@BenjaminBossan How about trying the mixtral model with LoRA layer? I think this error did not occurr from Full-finetuning, fine-tuning. LoRA fine-tuning |
I see, thanks. I don't have a machine available to test full mixtral, so I couldn't test it. |
@BenjaminBossan Then may I help you with testing the LoRA fine-tuning mixtral? |
Thanks, that's not necessary. My main concern is to have some kind of test to ensure that we don't have regressions in the future, but maybe that's not easily possible here? I tried a tiny mixtral model I found on HF but that didn't trigger any error. Running full mixtral won't work on our CI. |
@BenjaminBossan Have you tried fine-tuning with LoRA? I attempted fine-tuning using LoRA. It's an instruct-tune utilizing LoRA, not full fine-tuning. But your previous code looks full-fine-tuning, I think.. |
Well, we have a bunch of tests for fine-tuning with LoRA. I wrote the test for full fine-tuning because you had said earlier:
Maybe I misunderstood what you meant. |
@BenjaminBossan However, the issue I'm referring to here is the instruct tuning (fine-tuning) using LoRA adapters through load_in_4bit. After hearing your words, I've also started a training test with a new initial environment. It would be great if we could run this together and share related information. I'm very grateful for your contribution to this open-source. |
These things happen, glad we're now on the same boat.
Thanks. If you have something to share, let me know. At the end of the day, if we have confirmation that this PR fixes a real world issue, even if we cannot add it to a unit test, it's fine with me. Maybe we can upload the script somewhere and add a link to it as a comment. |
@BenjaminBossan Sure! Moreover, I'm sharing recent test result using Mixtral LoRA finetuning. After that, I removed the latest version of peft and implemented the library (PR) I modified.
Now, It runs really well! |
@DopeorNope-Lee Could you please run |
@BenjaminBossan I run make style and push it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes and the fruitful discussion. LGTM.
Moreover, I'm sharing recent test result using Mixtral LoRA finetuning.
Is it possible for you to share the script?
@BenjaminBossan Sure, I usually used Platypus code with bash-file.
|
I think this PR should be ready to be merged, right? @DopeorNope-Lee could you please fix the small merge conflict? @younesbelkada @pacman100 do you have further comments? |
@BenjaminBossan Sure, I fixed it.! |
@pacman100 @younesbelkada @BenjaminBossan Hi? Could you approve my PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @DopeorNope-Lee for the fixes, LGTM!
@DopeorNope-Lee I think the last merge with main resulted in incorrect code, with weights being merged twice, resulting in the failing CI. Could you please take a look? |
@BenjaminBossan |
Thanks @DopeorNope-Lee for correcting the resolution to work as intended. This LGTM now. |
huggingface#1372) Avoid in-place operations for LoRA forward and merging.
Issue:
The current implementation in the file peft/tuners/lora/layer.py encounters a runtime error due to an in-place operation on a leaf variable that requires gradient computation. Specifically, the error is triggered by the following line of code:
result += (after_A @ embedding_B) * scaling
This line uses the += operator, which modifies the result tensor in-place. When result is a leaf variable with requires_grad=True, such in-place operations are incompatible with PyTorch's autograd system, leading to a "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation".
Solution:
To resolve this issue, I propose modifying the in-place operation to a regular operation that creates a new tensor. This change ensures that the original value of result is not altered, thereby maintaining compatibility with the autograd system. The updated line of code is as follows:
result = result + (after_A @ embedding_B) * scaling
This modification allows the computation to proceed without altering the original result tensor, thus avoiding the RuntimeError and ensuring proper gradient calculation during backpropagation.
Therefor, before modification I encountered
"RuntimeError: a leaf Variable that requires grad is being used in an in-place operation."
However, I revised like that it solved!
Best