FIX: Error with OLoRA init when using bnb #2011

BenjaminBossan · 2024-08-16T12:36:48Z

Resolves #1999

dtype check for bnb quantized weights can be misleading when using bnb_4bit_quant_storage
updated base weights must be re-quantized if original weight is bnb quantized

[WIP] Resolves huggingface#1999

HuggingFaceDocBuilderDev · 2024-08-16T12:40:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-08-20T14:56:54Z

@tokenizer-decode Would be great if you could review.

tokenizer-decode · 2024-08-21T23:29:53Z

Been very busy lately. Is this urgent? I'll take a look at it this week.

BenjaminBossan · 2024-08-22T09:33:36Z

Been very busy lately. Is this urgent? I'll take a look at it this week.

That's more than enough, thanks.

tokenizer-decode · 2024-08-23T00:13:53Z

Interesting that this happens for specific models. I got the point we need to create 4bit and 8bit objects ourselves. I guess bnb is not a peft requirement and that's why you are conditionally importing it? Implementation looks good. But beside the actual problem I think olora_init is becoming very complex and hard to maintain at this point. How about doing something like this:

def transform_if_necessary(weights: torch.nn.Parameter) -> torch.nn.Parameter:
    if weights.__class__.__name__ in ["Params4bit", "Int8Params"]:
        return weights.__class__(dequantize_module_weight(weights), quant_type=weights.quant_type).to(weights.device)
    return weights

And we would do:

    def olora_init(self, adapter_name):
        base_layer = self.get_base_layer()
        orig_weight = base_layer.weight
        dtype = orig_weight.dtype

        if dtype in [torch.float32, torch.float16, torch.bfloat16, "Actual_Possible_BNB_Types"]:
            weight_tensor = orig_weight
        else:
            raise TypeError(f"Unsupported data type for the base layer. Got {dtype}.")

        scale_factor = self.scaling[adapter_name]
        r = self.r[adapter_name]
        weight_tensor = weight_tensor.to(torch.float32)
        Q, R = torch.linalg.qr(weight_tensor.data)

        Qr, Rr = Q[:, :r], R[:r]
        self.lora_A[adapter_name].weight.data = Rr.contiguous()
        self.lora_B[adapter_name].weight.data = Qr.contiguous()

        weight_tensor.data -= scale_factor * self.lora_B[adapter_name].weight @ self.lora_A[adapter_name].weight
        base_layer.weight = transform_if_necessary(orig_weight)

Haven't tested this. It may need adjustments. I might be being too pedantic here. Just a recommendation. Otherwise it looks good.

BenjaminBossan · 2024-08-23T09:21:55Z

I guess bnb is not a peft requirement and that's why you are conditionally importing it?

Exactly.

I think olora_init is becoming very complex and hard to maintain at this point.

This is unfortunately the nature of the beast. As usage grows, more edge cases are discovered that need to be taken care of. I don't see a big advantage in moving the layer creation into a separate function, it could actually make the code harder to understand (I'd do it if we foresee it being used elsewhere too). But I see a point in using weight.__class__ to avoid the bnb import. WDYT?

tokenizer-decode · 2024-08-23T10:22:26Z

it could actually make the code harder to understand

Fair enough. But I think there is advantage in eliminating the repetiton with single a weight.__class__ call

But I see a point in using weight.__class__ to avoid the bnb import. WDYT?

Yeah I saw that after sending the comment. Nice touch. Wouldn't hurt imo.

BenjaminBossan · 2024-08-23T10:25:38Z

But I think there is advantage in eliminating the repetiton with single a weight.__class__ call

The issue is that the init of the 4bit vs 8bit params class is a bit different, so we cannot make it the same call, or do you mean something else?

tokenizer-decode · 2024-08-23T10:26:54Z

I mean we make a single call when we do this:
if weights.__class__.__name__ in ["Params4bit", "Int8Params"]: return weights.__class__(dequantize_module_weight(weights),quant_type=weights.quant_type).to(weights.device)

BenjaminBossan · 2024-08-23T10:43:01Z

Hmm, can we though? The constructors of Params4bit and Int8Params are different, e.g. the former has requires_grad=False and the latter requires_grad=True (not sure why).

tokenizer-decode · 2024-08-23T10:55:15Z

Didn't notice that. Wouldn't that mean Int8Params will not be updated? Maybe it should be set True for inference. Have you tried both with requires_grad=True?

BenjaminBossan · 2024-08-23T14:19:19Z

Wouldn't that mean Int8Params will not be updated?

Well, the point of QLoRA is exactly that the quantized base weights are not updated ;-) Not sure if it's even possible tbh but in any case we don't want that for PEFT.

tokenizer-decode · 2024-08-23T15:15:05Z

But Int8 set to be True by default. Weird. I don't know much about QLoRA tbh. Anyway this still would not invalidate our approach. You would instead do:
weights.__class__(dequantize_module_weight(weights),quant_type=weights.quant_type, requires_grad=False).to(weights.device)

BenjaminBossan · 2024-08-23T15:57:27Z

You would instead do:
weights.__class__(dequantize_module_weight(weights),quant_type=weights.quant_type, requires_grad=False).to(weights.device)

This still does not quite work, as 8bit params don't have the quant_type attribute. I simplified the code to use __class__ now, but I don't think more than that is possible. Please take a look.

tokenizer-decode · 2024-08-23T16:17:36Z

Okay at least we don't import bnb. Lgtm.

SunMarc

Thanks for the PR @BenjaminBossan ! I left a small question to better understand what's happening !

SunMarc · 2024-08-30T14:23:48Z

src/peft/tuners/lora/layer.py

+        if bnb_param_type == "4bit":
+            weight_tensor = orig_weight.__class__(weight_tensor, quant_type=orig_weight.quant_type).to(
+                orig_weight.device
+            )
+            base_layer.weight = weight_tensor
+        elif bnb_param_type == "8bit":
+            weight_tensor = orig_weight.__class__(weight_tensor, requires_grad=False).to(orig_weight.device)
+            base_layer.weight = weight_tensor
+        else:
+            weight_tensor = weight_tensor.to(dtype)
+            base_layer.weight.data = weight_tensor


Why are we quantizing the weights this time ?

Normally for bnb weights, the tensors are flattened, e.g. shape [64, 1]. But after dequantizing, the weight_tensor that we assign here is not flat anymore, e.g. shape [16, 16]. My reasoning was that we should get back a "correct" tensor, so better to re-initialize it.

I tried what happens when I remove this and just do base_layer.weight.data = weight_tensor and curiously, this seems to work too and the test passes, even if the shape is now wrong. This makes me wonder if bnb somehow handles this automatically and we should not re-initialize (which could cause its own problems)? Not sure, any suggestion?

I tried what happens when I remove this and just do base_layer.weight.data = weight_tensor

Wow that's really strange indeed. I tried to check the code in bnb and it doesn't look like they handle this. cc @matthewdouglas

This makes me wonder if bnb somehow handles this automatically and we should not re-initialize (which could cause its own problems)?

I think that's fine as long as you pass the relevant kwargs that you can get from orig_weight. However, make sure to not pass bnb_quantized arg for the 4-bit case. Then, with to(orig_weight.device), it should quantize the weights properly.

Great, thanks for the additional info.

However, make sure to not pass bnb_quantized arg for the 4-bit case. Then, with to(orig_weight.device), it should quantize the weights properly.

To clarify, is the present code in alignment with what you suggest or do I need to call to(orig_weight.device) too?

Wow that's really strange indeed. I tried to check the code in bnb and it doesn't look like they handle this.

Okay, then it's probably better to get Matthew's opinion before merging this.

As far as the shapes go, both 4bit and 8bit have some mechanisms in place to track the original shapes, but it's different for each. The Linear8bitLt has a state.SB and for 4bit that information is part of quant_state. The main expectation is that it is all stored in a contiguous row-major format.

That said, it's not really clear to me that dequantize_module_weight() is doing all that it would need to do. Maybe it would pass the test here but I would think the updated weights would not be quantized properly afterwards, so re-initializing it is probably the way to go.

To clarify, is the present code in alignment with what you suggest or do I need to call to(orig_weight.device) too?

You'd want to have .to(orig_weight.device) in addition to the other kwargs as @SunMarc mentioned.

I updated the inits to take into account all arguments. Unfortunately, this may get out of date if bnb is updated, but I think there is no method such as bnb.create_param_like(tensor) or such to offload this work to bnb.

It would be great if you could do a final pass over the change.

Can't we just pass orig_weight.__dict_ as the kwargs ? This is what how we did it in transformers.

Hmm, I wonder if that's really more robust. If a new attribute is added that is not an __init__ argument, this would fail, right?

class Foo: def __init__(self, x): self.x = x self.y = 123 foo = Foo("hi") Foo(**foo.__dict__) # TypeError: Foo.__init__() got an unexpected keyword argument 'y'

So no matter what, this code may break if there is some change to the __init__ code in bnb.

Oh yeah, that right :/

Okay, I merged as is then. Code is going to eventually break one way or the other :D

SunMarc

LGTM ! Thanks for iterating !

FIX Error with OLoRA init when using bnb

9f3308a

[WIP] Resolves huggingface#1999

BenjaminBossan mentioned this pull request Aug 16, 2024

Lora initialisation with olora and pissa not working with quantisation. #1999

Closed

4 tasks

BenjaminBossan added 2 commits August 20, 2024 16:52

Further fixes for 8bit

7649726

Add unit tests

3aec8e3

BenjaminBossan marked this pull request as ready for review August 21, 2024 08:11

BenjaminBossan added 3 commits August 23, 2024 17:55

Reviewer feedback: simply bnb param construction

7fe1b53

Merge branch 'main' into fix-olora-bnb

54f9952

Make style

9ad19ca

BenjaminBossan requested a review from SunMarc August 26, 2024 10:10

Merge branch 'huggingface:main' into fix-olora-bnb

aae4ef1

SunMarc approved these changes Aug 30, 2024

View reviewed changes

Reviewer feedback: Pass all bnb init params

db62772

SunMarc approved these changes Sep 2, 2024

View reviewed changes

BenjaminBossan merged commit 37b9c5c into huggingface:main Sep 3, 2024
14 checks passed

BenjaminBossan deleted the fix-olora-bnb branch September 3, 2024 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Error with OLoRA init when using bnb #2011

FIX: Error with OLoRA init when using bnb #2011

BenjaminBossan commented Aug 16, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 16, 2024

BenjaminBossan commented Aug 20, 2024

tokenizer-decode commented Aug 21, 2024

BenjaminBossan commented Aug 22, 2024

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024

SunMarc left a comment

SunMarc Aug 30, 2024

BenjaminBossan Aug 30, 2024 •

edited

Loading

SunMarc Aug 30, 2024 •

edited

Loading

BenjaminBossan Aug 30, 2024

matthewdouglas Aug 30, 2024 •

edited

Loading

BenjaminBossan Sep 2, 2024

SunMarc Sep 2, 2024

BenjaminBossan Sep 3, 2024

SunMarc Sep 3, 2024

BenjaminBossan Sep 3, 2024

SunMarc left a comment

FIX: Error with OLoRA init when using bnb #2011

FIX: Error with OLoRA init when using bnb #2011

Conversation

BenjaminBossan commented Aug 16, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Aug 16, 2024

BenjaminBossan commented Aug 20, 2024

tokenizer-decode commented Aug 21, 2024

BenjaminBossan commented Aug 22, 2024

tokenizer-decode commented Aug 23, 2024 • edited Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 • edited Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 • edited Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 • edited Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024 • edited Loading

BenjaminBossan commented Aug 23, 2024

tokenizer-decode commented Aug 23, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Aug 30, 2024

Choose a reason for hiding this comment

BenjaminBossan Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

SunMarc Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

BenjaminBossan Aug 30, 2024

Choose a reason for hiding this comment

matthewdouglas Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

BenjaminBossan Sep 2, 2024

Choose a reason for hiding this comment

SunMarc Sep 2, 2024

Choose a reason for hiding this comment

BenjaminBossan Sep 3, 2024

Choose a reason for hiding this comment

SunMarc Sep 3, 2024

Choose a reason for hiding this comment

BenjaminBossan Sep 3, 2024

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Aug 16, 2024 •

edited

Loading

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

tokenizer-decode commented Aug 23, 2024 •

edited

Loading

BenjaminBossan Aug 30, 2024 •

edited

Loading

SunMarc Aug 30, 2024 •

edited

Loading

matthewdouglas Aug 30, 2024 •

edited

Loading