Using Peft LoRA for better, simpler UNet fine-tuning? #9102
Replies: 4 comments
-
@AbraarArique Hi, I'm recently finetuning unet with lora, the baseline i use is quite like the code you gave above(get_peft_model), but the effect is not good, did you solve this problem or do you have a better solution? I would be really appreciated if you can answer me. |
Beta Was this translation helpful? Give feedback.
-
@lyb369 What do you mean that the effect is not good? I ran some UNet LoRA fine-tuning runs using PEFT and it seemed to work fine. Is there a particular problem you're having? Do note that the code example I gave above is just for the UNet whereas the HF script also allows fine-tuning the text encoders... |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply! I use the code like this to add lora_layers to unet, I successfully add the lora_layers, but I find when I freeze the original parameters of the UNet and perform gradient descent only on the LoRA layer parameters, the LoRA layer parameters do not change at all. config = LoraConfig(r=16, target_modules=[...])
unet = UNet2DConditionModel.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
subfolder="unet",
)
fake_score = Loramodel(unet, config, adapter_name="default") My code used for gradient descent is below: with misc.ddp_sync(fake_score_ddp, (round_idx == num_accumulation_rounds - 1)):
#Denoised fake images (stop generator gradient) under fake score network, using guidance scale: kappa1=cfg_eval_train
noise_fake = sid_sd_denoise(unet=fake_score_ddp,images=images,noise=noise,contexts=contexts,timesteps=timesteps,
noise_scheduler=noise_scheduler,
text_encoder=text_encoder, tokenizer=tokenizer,
resolution=resolution,dtype=dtype,predict_x0=False,guidance_scale=cfg_train_fake)
nan_mask = torch.isnan(noise_fake).flatten(start_dim=1).any(dim=1)
if noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(images, noise, timesteps)
nan_mask = nan_mask | torch.isnan(target).flatten(start_dim=1).any(dim=1)
# Check if there are any NaN values present
if nan_mask.any():
# Invert the nan_mask to get a mask of samples without NaNs
non_nan_mask = ~nan_mask
# Filter out samples with NaNs from y_real and y_fake
noise_fake = noise_fake[non_nan_mask]
noise = noise[non_nan_mask]
if noise_scheduler.config.prediction_type == "v_prediction":
target = target[non_nan_mask]
if noise_scheduler.config.prediction_type == "v_prediction":
loss = (noise_fake-target)**2
snr = compute_snr(noise_scheduler, timesteps)
loss = loss * snr/(snr+1)
else:
loss = (noise_fake-noise)**2
loss=loss.sum().mul(loss_scaling / batch_gpu_total)
del images
if len(noise) > 0:
loss.backward() Can you or someone help me find the error place, I would be really appreciated! |
Beta Was this translation helpful? Give feedback.
-
@lyb369 In my code, I used If so, then do the added LoRA layers have params = list(filter(lambda p: p.requires_grad, unet.parameters())) |
Beta Was this translation helpful? Give feedback.
-
I was looking at the Stable Diffusion XL LoRA fine-tuning script:
It seems that, while adding LoRA to the UNet is simple and intuitive enough, saving and loading the models/checkpoints is quite complicated (uses internal methods, implementations, and heuristics).
So I'm wondering if it's possible to use Hugging Face PEFT's standard API to do this instead? Like:
I tested and the Peft-wrapped UNet seems to work fine for inference.
So my question is: will this work for training as well, or will the Peft-wrapper cause any problems/incompatibilities?
If this is indeed a simpler approach, would it be better to use this in the example training script as well? (I can update it)
For reference, here's how it's done now:
Beta Was this translation helpful? Give feedback.
All reactions