-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_base_model()
is returning the base model with the LoRA still applied.
#430
Comments
This issue still needs to be addressed. |
I don't think disable_adapter() works... I made an extension for webui and I can switch LORAS fine, but disable_adapter does noting - it doesn't disable the lora - the lora is still applied, even after calling it |
Why is that? |
https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora.py#L672-L675 You can see the effect of disabling lora in this example notebook: https://github.com/huggingface/peft/blob/main/examples/multi_adapter_examples/PEFT_Multi_LoRA_Inference.ipynb I don't understand what the issue is? There should be difference in the outputs when enabling and disabling lora adapters |
Using When loading an adapter, should the original base model be permanently changed as well? I'm loading adapters with code like this:
If i do inference with
The first base model result matches the base model output within the The first peft model result matches the last base model output (without This feels wrong. I would expect to be able to do inference on I filed a similar bug in #515 and this doesn't seem to be fixed. This issue also prevents anyone from loading two Peft models that share the same base model at the same time. When I add in
I get the same result from both models. |
Is anyone looking into this? It still feels very unexpected for the base model to be modified when creating a peft model ontop of it. |
After reading the code very closely, I don't think this is going to get fixed. the only way to access the base model without any adapter replacements is to do with model.disable_adapter():
# stuff with base model When loading LoRA adapter weights, it walks through the base model's modules and swaps out some of them with LoRA replacements. This is why trying to access the base model directly, or even keeping the base model around like in my example, inference always gets the adapters influence. My guess is that this is too big of an architectural change. My request to the Peft authors however is to be much more explicit that this is how LoRA adapter inference is implemented. I found this behavior very surprising and it's not explicitly written about anywhere I saw. |
Related, given how baked this architecture is in Peft, I've made a request to vllm to try and implement this in a way that you can load multiple LoRA adapters independently while still being able to access the base model. |
@oobabooga Something to be aware of, if you aren't already |
Yes, this is correct, the base model is mutated when it is converted into a peft model. If you need the original model, at the moment you would have to create a copy of it before passing it to peft. If you have some suggestion where we could update the docs to make this more obvious, please let us know. |
Hi! The |
I'd probably put this in the pydoc for |
1. Addresses huggingface#430 (comment) 2. Reword docstring to not be LoRA-specific
1. Addresses #430 (comment) 2. Reword docstring to not be LoRA-specific
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
@fozziethebeat does peft support vllm speedup |
I think you mean does vLLM support Loras? I think vLLM now supports running models with Loras but I haven't tried it personally. |
Also, PediBase's Lorax project is basically the right solution to this problem now. |
@fozziethebeat oh yes。 model = PeftModel.from_pretrained(model, adapter_to_resume, is_trainable=False); |
Technically, I'm just grabbing the
.base_model.model
directly, rather than usingget_base_model()
, but that should have the same effect, since that's allget_base_model()
does if theactive_peft_config
is notPromptLearningConfig
as seen here.After loading a llama model with a LoRA, like so:
The PeftModel loads fine and everything is working as expected. However, I can not figure out how to get the original model back without a LoRA still being active when I do an inference.
The code I'm using is from here:
This gives me the model back as a
LlamaForCausalLM
, but when I go to inference, the LoRA is still applied. I made a couple of test LoRAs so that there would be no question as to whether the LoRA is still loaded. They can be found here: https://huggingface.co/clayshoaf/AB-Lora-TestI am digging around right now, and I see this line:
if isinstance(module, LoraLayer):
from:So I checked in the program and if I load a LoRA and do
it returns a bunch of modules that are of the type
Linear8bitLt
(if loaded in 8bit) orLinear4bitLt
(if loaded in 4bit).Would it work to set the modules'
disable_adapters
value to false? I don't want to hack around too much in the code, because I don't have a deep enough understanding to be sure that I won't mess something else up in the process.If that won't work, is there something else that I should be doing?
The text was updated successfully, but these errors were encountered: