Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LoRA for clip text encoder in diffusers #21770

Closed
wants to merge 5 commits into from

Conversation

haofanwang
Copy link

What does this PR do?

Support a feature in huggingface/diffusers#2469. For now, as stable diffusion uses CLIPTextEncoder, it doesn't support adding LoRA layers yet. What we have done is quite similar to UNet2DConditionModel.

What to expect after this PR?

import torch
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers.models.cross_attention import LoRACrossAttnProcessor

tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="text_encoder")
text_encoder.requires_grad_(False)

# add LoRA layers
lora_attn_procs = {}
for name in text_encoder.attn_processors.keys():
    cross_attention_dim = None if name.endswith("self_attn.processor") else text_encoder.config.hidden_size
    hidden_size = text_encoder.config.hidden_size
    lora_attn_procs[name] = LoRACrossAttnProcessor(
        hidden_size=hidden_size, cross_attention_dim=cross_attention_dim
    )
text_encoder.set_attn_processor(lora_attn_procs)

inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt")
outputs = text_encoder(**inputs)

# only added LoRA weights require gradients
for name, param in text_encoder.named_parameters():
    print(name, param.requires_grad) 

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 23, 2023

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Feb 24, 2023

The support for LoRA should be done using our new peft library. We won't change Transformers models directly. cc @pacman100 @patrickvonplaten

@haofanwang
Copy link
Author

Sure, it makes sense to me. I'm glad to know. I will directly make a new PR in diffusers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants