You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@AugustRush I can reproduce this error. I think it happens because of how and when we quantise the model and load the LoRA weights. I will look into this in more detail later...
In the non local-path quantitated case: Since the LoRA weights are assumed to be non-quantized, we first merge them into the yet-to-be-quantized model, and then we quantise the whole model. But in the case where the original model is already quantized, then we cannot do the merging with the non-quantised LoRA weights.
At the moment I am not fully sure how easy this is to fix, but at least one option I can add (which is nice to have regardless) is for the save.py to support the --lora-paths and --lora-scales arguments, then at least you can save a merged version of the the weights with the LoRA file baked in and then when you run it you would not need to specify the LoRA files and it should work. One obvious downside with this is of course that you cannot easily swap the LoRAs.
weight = transWeight + lora_scale * (lora_b @ lora_a)
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ValueError: Shapes (3072,768) and (3072,3072) cannot be broadcast.
The text was updated successfully, but these errors were encountered: