-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage spike when using merge_and_unload
#1939
Comments
Do you have the error message? Could you also share the PEFT config and what base model you use? |
Hey @BenjaminBossan, there's no error message per se because it's a CPU oom -- the job just crashes. I'm using
Looking at |
Thanks for providing further information.
No, the model does not do that. However, during intermediary steps, it will allocate additional tensors that are used to update the weights: peft/src/peft/tuners/lora/layer.py Lines 925 to 927 in 6472061
(assuming you use normal LoRA, not some form of quantization) In the grand scheme of things, this should not occupy a lot of extra memory compared to the model as a whole, but if you're already very close to the total memory, this could be the difference that results in OOM. We cannot avoid allocating some new memory for merging. But the last line quoted above could be changed to use - base_layer.weight.data = base_layer.weight.data + delta_weight
+ base_layer.weight.data += delta_weight |
Thank you @BenjaminBossan -- I got to that same point in the code too. That change does seem to help at a very small scale (i'm tracking memory usage as LoRA adapters get merged on a tiny opt-125m model) -- will validate at larger scale as well. Are there any other points where we might be allocating some extra tensor we don't need? |
Please report back once you have the results.
At first glance, I don't think so. Not saying there is no possibility at all, but at least no low hanging fruits. |
Hey @BenjaminBossan -- confirming that this simple change does indeed address the extra memory usage. And it's not insignificant too, see the graph below showing % system memory utilization, where the orange line is without the fix (run stops early because of the OOM) and the green line is with the fix (there are some operations we do after we Submitting a PR to peft asap. Would it be possible to do a patch release? Otherwise we'll have to patch this in ourselves. |
Resolved via #1944. |
System Info
peft version 0.23.4
transformers version 4.42.3
Who can help?
@BenjaminBossan @sayakpaul
Information
Tasks
examples
folderReproduction
Expected behavior
Hi, I'm trying to merge lora adapters into the model with the
merge_and_unload
function. My model is oncpu
. I'm near the cpu memory limit on my machine so I would like themerge_and_unload
operation to happen in-place, but there seems to be a spike in memory usage when this function is called, causing cpu OOM. Is there an in-place version of this function / what makesmerge_and_unload
increase cpu memory usage significantly?The text was updated successfully, but these errors were encountered: