Memory usage spike when using `merge_and_unload` #1939

snarayan21 · 2024-07-22T04:02:42Z

System Info

peft version 0.23.4
transformers version 4.42.3

Who can help?

@BenjaminBossan @sayakpaul

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

model = model.merge_and_unload()

Expected behavior

Hi, I'm trying to merge lora adapters into the model with the merge_and_unload function. My model is on cpu. I'm near the cpu memory limit on my machine so I would like the merge_and_unload operation to happen in-place, but there seems to be a spike in memory usage when this function is called, causing cpu OOM. Is there an in-place version of this function / what makes merge_and_unload increase cpu memory usage significantly?

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-07-22T09:29:11Z

Do you have the error message? Could you also share the PEFT config and what base model you use?

snarayan21 · 2024-07-22T12:19:32Z

Hey @BenjaminBossan, there's no error message per se because it's a CPU oom -- the job just crashes. I'm using meta-llama/Meta-Llama-3-70B with the peft config below:

peft_config:
      r: 16
      peft_type: LORA
      task_type: CAUSAL_LM
      lora_alpha: 32
      lora_dropout: 0
      target_modules:
      - q_proj
      - k_proj
      - v_proj
      - o_proj
      - down_proj
      - up_proj
      - gate_proj

Looking at _unload_and_optionally_merge, is the LoRAModel holding on to additional unmerged parameters that it shouldn't be during the merge process?

BenjaminBossan · 2024-07-22T12:45:21Z

Thanks for providing further information.

is the LoRAModel holding on to additional unmerged parameters that it shouldn't be during the merge process?

No, the model does not do that. However, during intermediary steps, it will allocate additional tensors that are used to update the weights:

peft/src/peft/tuners/lora/layer.py

Lines 925 to 927 in 6472061

    
           delta_weight = self.get_delta_weight(active_adapter) 
        
           if not self.use_dora[active_adapter]: 
        
               base_layer.weight.data = base_layer.weight.data + delta_weight

(assuming you use normal LoRA, not some form of quantization)

In the grand scheme of things, this should not occupy a lot of extra memory compared to the model as a whole, but if you're already very close to the total memory, this could be the difference that results in OOM.

We cannot avoid allocating some new memory for merging. But the last line quoted above could be changed to use +=, which I think should avoid allocating one additional tensor. Not sure if that would be enough to tip the scale. If possible, could you try to check if your script passes if you make that change?

-                       base_layer.weight.data = base_layer.weight.data + delta_weight
+                       base_layer.weight.data += delta_weight

snarayan21 · 2024-07-22T14:58:19Z

Thank you @BenjaminBossan -- I got to that same point in the code too. That change does seem to help at a very small scale (i'm tracking memory usage as LoRA adapters get merged on a tiny opt-125m model) -- will validate at larger scale as well.

Are there any other points where we might be allocating some extra tensor we don't need?

BenjaminBossan · 2024-07-22T15:14:24Z

will validate at larger scale as well.

Please report back once you have the results.

Are there any other points where we might be allocating some extra tensor we don't need?

At first glance, I don't think so. Not saying there is no possibility at all, but at least no low hanging fruits.

snarayan21 · 2024-07-22T20:58:22Z

Hey @BenjaminBossan -- confirming that this simple change does indeed address the extra memory usage. And it's not insignificant too, see the graph below showing % system memory utilization, where the orange line is without the fix (run stops early because of the OOM) and the green line is with the fix (there are some operations we do after we merge_and_unload).

Submitting a PR to peft asap. Would it be possible to do a patch release? Otherwise we'll have to patch this in ourselves.

BenjaminBossan · 2024-08-07T10:36:48Z

Resolved via #1944.

snarayan21 mentioned this issue Jul 22, 2024

Decrease memory usage of merge_and_unload #1944

Merged

BenjaminBossan closed this as completed Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage spike when using `merge_and_unload` #1939

Memory usage spike when using `merge_and_unload` #1939

snarayan21 commented Jul 22, 2024

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024 •

edited

Loading

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024

BenjaminBossan commented Aug 7, 2024

Memory usage spike when using merge_and_unload #1939

Memory usage spike when using merge_and_unload #1939

Comments

snarayan21 commented Jul 22, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024 • edited Loading

BenjaminBossan commented Jul 22, 2024

snarayan21 commented Jul 22, 2024

BenjaminBossan commented Aug 7, 2024

Memory usage spike when using `merge_and_unload` #1939

Memory usage spike when using `merge_and_unload` #1939

snarayan21 commented Jul 22, 2024 •

edited

Loading