Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage spike when using merge_and_unload #1939

Closed
4 tasks
snarayan21 opened this issue Jul 22, 2024 · 7 comments
Closed
4 tasks

Memory usage spike when using merge_and_unload #1939

snarayan21 opened this issue Jul 22, 2024 · 7 comments

Comments

@snarayan21
Copy link
Contributor

System Info

peft version 0.23.4
transformers version 4.42.3

Who can help?

@BenjaminBossan @sayakpaul

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

model = model.merge_and_unload()

Expected behavior

Hi, I'm trying to merge lora adapters into the model with the merge_and_unload function. My model is on cpu. I'm near the cpu memory limit on my machine so I would like the merge_and_unload operation to happen in-place, but there seems to be a spike in memory usage when this function is called, causing cpu OOM. Is there an in-place version of this function / what makes merge_and_unload increase cpu memory usage significantly?

@BenjaminBossan
Copy link
Member

Do you have the error message? Could you also share the PEFT config and what base model you use?

@snarayan21
Copy link
Contributor Author

Hey @BenjaminBossan, there's no error message per se because it's a CPU oom -- the job just crashes. I'm using meta-llama/Meta-Llama-3-70B with the peft config below:

peft_config:
      r: 16
      peft_type: LORA
      task_type: CAUSAL_LM
      lora_alpha: 32
      lora_dropout: 0
      target_modules:
      - q_proj
      - k_proj
      - v_proj
      - o_proj
      - down_proj
      - up_proj
      - gate_proj

Looking at _unload_and_optionally_merge, is the LoRAModel holding on to additional unmerged parameters that it shouldn't be during the merge process?

@BenjaminBossan
Copy link
Member

Thanks for providing further information.

is the LoRAModel holding on to additional unmerged parameters that it shouldn't be during the merge process?

No, the model does not do that. However, during intermediary steps, it will allocate additional tensors that are used to update the weights:

delta_weight = self.get_delta_weight(active_adapter)
if not self.use_dora[active_adapter]:
base_layer.weight.data = base_layer.weight.data + delta_weight

(assuming you use normal LoRA, not some form of quantization)

In the grand scheme of things, this should not occupy a lot of extra memory compared to the model as a whole, but if you're already very close to the total memory, this could be the difference that results in OOM.

We cannot avoid allocating some new memory for merging. But the last line quoted above could be changed to use +=, which I think should avoid allocating one additional tensor. Not sure if that would be enough to tip the scale. If possible, could you try to check if your script passes if you make that change?

-                       base_layer.weight.data = base_layer.weight.data + delta_weight
+                       base_layer.weight.data += delta_weight

@snarayan21
Copy link
Contributor Author

snarayan21 commented Jul 22, 2024

Thank you @BenjaminBossan -- I got to that same point in the code too. That change does seem to help at a very small scale (i'm tracking memory usage as LoRA adapters get merged on a tiny opt-125m model) -- will validate at larger scale as well.

Are there any other points where we might be allocating some extra tensor we don't need?

@BenjaminBossan
Copy link
Member

will validate at larger scale as well.

Please report back once you have the results.

Are there any other points where we might be allocating some extra tensor we don't need?

At first glance, I don't think so. Not saying there is no possibility at all, but at least no low hanging fruits.

@snarayan21
Copy link
Contributor Author

Hey @BenjaminBossan -- confirming that this simple change does indeed address the extra memory usage. And it's not insignificant too, see the graph below showing % system memory utilization, where the orange line is without the fix (run stops early because of the OOM) and the green line is with the fix (there are some operations we do after we merge_and_unload).

Submitting a PR to peft asap. Would it be possible to do a patch release? Otherwise we'll have to patch this in ourselves.
Screenshot 2024-07-22 at 10 55 56 PM

@BenjaminBossan
Copy link
Member

Resolved via #1944.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants