Using Peft LoRA for better, simpler UNet fine-tuning? #9102

AbraarArique · 2024-08-06T16:55:22Z

AbraarArique
Aug 6, 2024

I was looking at the Stable Diffusion XL LoRA fine-tuning script:

It seems that, while adding LoRA to the UNet is simple and intuitive enough, saving and loading the models/checkpoints is quite complicated (uses internal methods, implementations, and heuristics).

So I'm wondering if it's possible to use Hugging Face PEFT's standard API to do this instead? Like:

config = LoraConfig(r=16, target_modules=[...])

unet = UNet2DConditionModel.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="unet",
)

# Train this Peft-wrapped UNet
lora = get_peft_model(unet, config)

# For saving/checkpointing
lora.save_pretrained("lora")

# For resuming from a checkpoint or inference
lora = PeftModel.from_pretrained(unet, "lora", is_trainable=True)

# Finally, build the pipeline with LoRA UNet
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    unet=lora,
)

I tested and the Peft-wrapped UNet seems to work fine for inference.

So my question is: will this work for training as well, or will the Peft-wrapper cause any problems/incompatibilities?

If this is indeed a simpler approach, would it be better to use this in the example training script as well? (I can update it)

For reference, here's how it's done now:

def save_model_hook(models, weights, output_dir):
    if accelerator.is_main_process:
        # there are only two options here. Either are just the unet attn processor layers
        # or there are the unet and text encoder attn layers
        unet_lora_layers_to_save = None
        text_encoder_one_lora_layers_to_save = None
        text_encoder_two_lora_layers_to_save = None

        for model in models:
            if isinstance(unwrap_model(model), type(unwrap_model(unet))):
                unet_lora_layers_to_save = convert_state_dict_to_diffusers(
                    get_peft_model_state_dict(model)
                )
            elif isinstance(unwrap_model(model), type(unwrap_model(text_encoder_one))):
                text_encoder_one_lora_layers_to_save = convert_state_dict_to_diffusers(
                    get_peft_model_state_dict(model)
                )
            elif isinstance(unwrap_model(model), type(unwrap_model(text_encoder_two))):
                text_encoder_two_lora_layers_to_save = convert_state_dict_to_diffusers(
                    get_peft_model_state_dict(model)
                )
            else:
                raise ValueError(f"unexpected save model: {model.__class__}")

            # make sure to pop weight so that corresponding model is not saved again
            if weights:
                weights.pop()

        StableDiffusionXLPipeline.save_lora_weights(
            output_dir,
            unet_lora_layers=unet_lora_layers_to_save,
            text_encoder_lora_layers=text_encoder_one_lora_layers_to_save,
            text_encoder_2_lora_layers=text_encoder_two_lora_layers_to_save,
        )


def load_model_hook(models, input_dir):
    unet_ = None
    text_encoder_one_ = None
    text_encoder_two_ = None

    while len(models) > 0:
        model = models.pop()

        if isinstance(model, type(unwrap_model(unet))):
            unet_ = model
        elif isinstance(model, type(unwrap_model(text_encoder_one))):
            text_encoder_one_ = model
        elif isinstance(model, type(unwrap_model(text_encoder_two))):
            text_encoder_two_ = model
        else:
            raise ValueError(f"unexpected save model: {model.__class__}")

    lora_state_dict, _ = StableDiffusionLoraLoaderMixin.lora_state_dict(input_dir)
    unet_state_dict = {
        f'{k.replace("unet.", "")}': v
        for k, v in lora_state_dict.items()
        if k.startswith("unet.")
    }
    unet_state_dict = convert_unet_state_dict_to_peft(unet_state_dict)
    incompatible_keys = set_peft_model_state_dict(
        unet_, unet_state_dict, adapter_name="default"
    )
    if incompatible_keys is not None:
        # check only for unexpected keys
        unexpected_keys = getattr(incompatible_keys, "unexpected_keys", None)
        if unexpected_keys:
            logger.warning(
                f"Loading adapter weights from state_dict led to unexpected keys not found in the model: "
                f" {unexpected_keys}. "
            )

    if args.train_text_encoder:
        _set_state_dict_into_text_encoder(
            lora_state_dict, prefix="text_encoder.", text_encoder=text_encoder_one_
        )

        _set_state_dict_into_text_encoder(
            lora_state_dict,
            prefix="text_encoder_2.",
            text_encoder=text_encoder_two_,
        )

    # Make sure the trainable params are in float32. This is again needed since the base models
    # are in `weight_dtype`. More details:
    # https://github.com/huggingface/diffusers/pull/6514#discussion_r1449796804
    if args.mixed_precision == "fp16":
        models = [unet_]
        if args.train_text_encoder:
            models.extend([text_encoder_one_, text_encoder_two_])
        cast_training_params(models, dtype=torch.float32)

lyb369 · 2024-10-30T09:27:30Z

lyb369
Oct 30, 2024

@AbraarArique Hi, I'm recently finetuning unet with lora, the baseline i use is quite like the code you gave above(get_peft_model), but the effect is not good, did you solve this problem or do you have a better solution? I would be really appreciated if you can answer me.

0 replies

AbraarArique · 2024-10-30T17:32:44Z

AbraarArique
Oct 30, 2024
Author

@lyb369 What do you mean that the effect is not good?

I ran some UNet LoRA fine-tuning runs using PEFT and it seemed to work fine. Is there a particular problem you're having?

Do note that the code example I gave above is just for the UNet whereas the HF script also allows fine-tuning the text encoders...

0 replies

lyb369 · 2024-11-01T10:59:16Z

lyb369
Nov 1, 2024

Thanks for your reply! I use the code like this to add lora_layers to unet, I successfully add the lora_layers, but I find when I freeze the original parameters of the UNet and perform gradient descent only on the LoRA layer parameters, the LoRA layer parameters do not change at all.

config = LoraConfig(r=16, target_modules=[...])

unet = UNet2DConditionModel.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="unet",
)

fake_score = Loramodel(unet, config, adapter_name="default")

My code used for gradient descent is below:

with misc.ddp_sync(fake_score_ddp, (round_idx == num_accumulation_rounds - 1)):
                    #Denoised fake images (stop generator gradient) under fake score network, using guidance scale: kappa1=cfg_eval_train
                    noise_fake = sid_sd_denoise(unet=fake_score_ddp,images=images,noise=noise,contexts=contexts,timesteps=timesteps,
                                                     noise_scheduler=noise_scheduler,
                                                     text_encoder=text_encoder, tokenizer=tokenizer, 
                                                     resolution=resolution,dtype=dtype,predict_x0=False,guidance_scale=cfg_train_fake)
                    nan_mask = torch.isnan(noise_fake).flatten(start_dim=1).any(dim=1)
                    if noise_scheduler.config.prediction_type == "v_prediction":
                        target = noise_scheduler.get_velocity(images, noise, timesteps)
                        nan_mask = nan_mask | torch.isnan(target).flatten(start_dim=1).any(dim=1)
                    # Check if there are any NaN values present
                    if nan_mask.any():
                        # Invert the nan_mask to get a mask of samples without NaNs
                        non_nan_mask = ~nan_mask
                        # Filter out samples with NaNs from y_real and y_fake
                        noise_fake = noise_fake[non_nan_mask]
                        noise = noise[non_nan_mask]
                        if noise_scheduler.config.prediction_type == "v_prediction":
                            target = target[non_nan_mask]
                    if noise_scheduler.config.prediction_type == "v_prediction":
                        loss = (noise_fake-target)**2
                        snr = compute_snr(noise_scheduler, timesteps)
                        loss = loss * snr/(snr+1)
                    else:
                        loss = (noise_fake-noise)**2
                    loss=loss.sum().mul(loss_scaling / batch_gpu_total)
                    del images   
                    if len(noise) > 0:
                        loss.backward()

Can you or someone help me find the error place, I would be really appreciated!

0 replies

AbraarArique · 2024-11-01T13:28:48Z

AbraarArique
Nov 1, 2024
Author

@lyb369 In my code, I used model = get_peft_model(unet, config). Are you using the same method?

If so, then do the added LoRA layers have requires_grad = True? Try printing the params like this:

params = list(filter(lambda p: p.requires_grad, unet.parameters()))

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Peft LoRA for better, simpler UNet fine-tuning? #9102

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Using Peft LoRA for better, simpler UNet fine-tuning? #9102

AbraarArique Aug 6, 2024

Replies: 4 comments

lyb369 Oct 30, 2024

AbraarArique Oct 30, 2024 Author

lyb369 Nov 1, 2024

AbraarArique Nov 1, 2024 Author

AbraarArique
Aug 6, 2024

lyb369
Oct 30, 2024

AbraarArique
Oct 30, 2024
Author

lyb369
Nov 1, 2024

AbraarArique
Nov 1, 2024
Author