Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for runwayml In-painting SD model. #3140

Closed
wants to merge 1,581 commits into from
Closed

Support for runwayml In-painting SD model. #3140

wants to merge 1,581 commits into from

Conversation

random-thoughtss
Copy link
Contributor

@random-thoughtss random-thoughtss commented Oct 19, 2022

A simple addition to support the new in-painting model released here:
https://github.com/runwayml/stable-diffusion

We update the stable-diffusion dependency to point to the new repo and pass in the required additional features to the model. It requires an extra masked-image and mask inputs which act as visual conditioning for the model. Setting the mask to be all 1s can also be used for txt2img generation.

Implemented

  1. K-Diffusion txt2img
  2. K-Diffusion img2img
  3. K-Diffusion inpaint

TODO

  1. VanillaStableDiffusionSampler updates
  2. Add a flag to detect if we need to create the masked tensors to save some memory.
  3. Fix use_ema: False config option. Currently need to add use_ema: False in sd-v1-5-inpainting.yaml, otherwise the checkpoint will not load.

AUTOMATIC1111 and others added 30 commits October 12, 2022 09:00
edit attention key handler: return early when weight parse returns NaN
The directory for the images saved with the Save button may still not exist, so it needs to be created prior to opening the log.csv file.
remake train interface to use tabs
Add option to store TI embeddings in png chunks, and load from same.
train: make it possible to make text files with prompts
train: rework scheduler so that there's less repeating code in textual inversion and hypernets
train: move epochs setting to options
deepbooru: added option to quote (\) in tags
deepbooru/BLIP: write caption to file instead of image filename
deepbooru/BLIP: now possible to use both for captions
deepbooru: process is stopped even if an exception occurs
@random-thoughtss
Copy link
Contributor Author

Have you tested the vanilla 1.4 model with this PR?

Yes, I observe matching seed parity with the CompVis stable-diffusion repo. The only code path that the visual conditing is used in is the new hybrid conditioning, so it shouldn't effect any crossattn models. Although it might be worth it to only create the masks when they are actually needed.
https://github.com/runwayml/stable-diffusion/blob/main/ldm/models/diffusion/ddpm.py#L1431

If the config .yaml needs to be changed, you can ship a config and use shared.cmd_opts.config to use that new config when loading the Runway model.

Ideally the config should not need to be changed. I Originally misattributed the bug. LatentInpaintDiffusion in the yaml is fine, but the original sd-v1-5-inpainting.yaml is missing use_ema: False. This causes the checkpoint to be loaded incorrectly, effectively not loading the checkpoint at all.

what is that extra masked-image?

It provides the network with contextual information about the original image. Presumably this allows it to better fine-tune the in-painting, creating a more coherent image.

@C43H66N12O12S2
Copy link
Collaborator

@random-thoughtss You can do sd_config.model.params.use_ema = False in sd_models.py after OmegaConf.load

@cornpo
Copy link

cornpo commented Oct 19, 2022

I'm in randomthoughtss branch, monkeypatched sd_config.model.params.use_ema = False > sd_models.py and 1.4 loads now, size mismatch persists for "1.5" inpaint.

caveat; torch1.12.1+rocm5.1, bur it usually doesn't matter.

File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/routes.py", line 275, in run_predict
    output = await app.blocks.process_api(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 787, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 694, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1633, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1488, in run_settings_single
    opts.data_labels[key].onchange()
  File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 40, in f
    res = func(*args, **kwargs)
  File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 85, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(shared.sd_model)))
  File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 252, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 169, in load_model_weights
    missing, extra = model.load_state_dict(sd, strict=False)
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
	size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

@C43H66N12O12S2
Copy link
Collaborator

That’s most likely due to our repo using the CompVis config. Try also adding:
sd_config.model.params.conditioning_key = hybrid

@C43H66N12O12S2
Copy link
Collaborator

I think this model could also be used for outpainting with great effect.

@cornpo
Copy link

cornpo commented Oct 19, 2022

   sd_config = OmegaConf.load(checkpoint_info.config)
###monkey    
    sd_config.model.params.use_ema = False
    sd_config.model.params.conditioning_key = hybrid
###
    sd_model = instantiate_from_config(sd_config.model)

Vanilla python webui.py

Traceback (most recent call last): File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 161, in <module> webui(cmd_opts.api) File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 122, in webui initialize() File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 84, in initialize shared.sd_model = modules.sd_models.load_model() File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 215, in load_model sd_config.model.params.conditioning_key = hybrid NameError: name 'hybrid' is not defined

@C43H66N12O12S2
Copy link
Collaborator

Change hybrid to "hybrid"

@cornpo
Copy link

cornpo commented Oct 19, 2022

size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

I give up for now. Non-programmer trashing up the collabo isn't going to do any good.

@C43H66N12O12S2
Copy link
Collaborator

Actually, that shouldn't happen. @random-thoughtss When you tested 1.4, did you change the model dimensions to match 1.4 inside the config?

We shouldn't break compatibility with 1.4, as 1.5 (which will release very soon now) uses the same dimensions.

@C43H66N12O12S2
Copy link
Collaborator

@AUTOMATIC1111 Curious to hear your thoughts on this model.

My thinking is like this:
Load the normal model at all times (whether that's vanilla 1.4, 1.5, WD or whatever)
Add a checkbox to outpainting & inpainting
If the user checks this checkbox, load the RunwayML model, run inference, unload (maybe dependent on a user setting).

@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 19, 2022

    sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
    sd_config.model.params.use_ema = False
    sd_config.model.params.conditioning_key = "hybrid"
    sd_config.model.params.unet_config.params.in_channels = 9

This is all that's needed to load it as-is. I've had better results outpainting with this model than inpainting but probably a skill issue. (hilariously poor man's outpainting seems to work better than mk2 with this model)

We also don't need to switch to the RunwayML repo for this. We can continue our proud tradition of hijacking the CompVis repo. I wrote some working code performing just that.

@AUTOMATIC1111
Copy link
Owner

oxy: switching to different repo is a big step, I need to grab his branch and check if it really is a lot better, then there can be some considerations.

@AUTOMATIC1111
Copy link
Owner

also is the sd 1.5 the finetuned 1.5 model that emad keeps from being released?

@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 19, 2022

We don’t need to switch repos. I wrote working hijacking code for this.

1.5 is (much like 1.4) just 1.2 but further along training.

1.4 is resumed from 1.2 and trained for ~270k steps I think, and 1.5 ~600k

that emad keeps from being released?

yes
@AUTOMATIC1111

+modules/sd_hijack_loading.py
import math
import os
import sys
import traceback
import torch
import numpy as np
from einops import rearrange
from omegaconf import ListConfig
from modules import shared

import ldm.models.diffusion.ddpm
from ldm.models.diffusion.ddpm import LatentDiffusion


@torch.no_grad()
def get_unconditional_conditioning(self, batch_size, null_label=None):
    if null_label is not None:
        xc = null_label
        if isinstance(xc, ListConfig):
            xc = list(xc)
        if isinstance(xc, dict) or isinstance(xc, list):
            c = self.get_learned_conditioning(xc)
        else:
            if hasattr(xc, "to"):
                xc = xc.to(self.device)
            c = self.get_learned_conditioning(xc)
    else:
        # todo: get null label from cond_stage_model
        raise NotImplementedError()
    c = repeat(c, "1 ... -> b ...", b=batch_size).to(self.device)
    return c

class LatentInpaintDiffusion(LatentDiffusion):
    def __init__(
        self,
        concat_keys=("mask", "masked_image"),
        masked_image_key="masked_image",
        *args,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.masked_image_key = masked_image_key
        assert self.masked_image_key in concat_keys
        self.concat_keys = concat_keys


    @torch.no_grad()
    def get_input(
        self, batch, k, cond_key=None, bs=None, return_first_stage_outputs=False
    ):
        # note: restricted to non-trainable encoders currently
        assert (
            not self.cond_stage_trainable
        ), "trainable cond stages not yet supported for inpainting"
        z, c, x, xrec, xc = super().get_input(
            batch,
            self.first_stage_key,
            return_first_stage_outputs=True,
            force_c_encode=True,
            return_original_cond=True,
            bs=bs,
        )

        assert exists(self.concat_keys)
        c_cat = list()
        for ck in self.concat_keys:
            cc = (
                rearrange(batch[ck], "b h w c -> b c h w")
                .to(memory_format=torch.contiguous_format)
                .float()
            )
            if bs is not None:
                cc = cc[:bs]
                cc = cc.to(self.device)
            bchw = z.shape
            if ck != self.masked_image_key:
                cc = torch.nn.functional.interpolate(cc, size=bchw[-2:])
            else:
                cc = self.get_first_stage_encoding(self.encode_first_stage(cc))
            c_cat.append(cc)
        c_cat = torch.cat(c_cat, dim=1)
        all_conds = {"c_concat": [c_cat], "c_crossattn": [c]}
        if return_first_stage_outputs:
            return z, all_conds, x, xrec, xc
        return z, all_conds

def do_hijack():
    ldm.models.diffusion.ddpm.get_unconditional_conditioning = get_unconditional_conditioning
    ldm.models.diffusion.ddpm.LatentInpaintDiffusion = LatentInpaintDiffusion

sd_models.py
from modules.sd_hijack_loading import do_hijack
in load_model

    if str(checkpoint_info.filename).endswith("inpainting.ckpt"):
        do_hijack()
        sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
        sd_config.model.params.use_ema = False
        sd_config.model.params.conditioning_key = "hybrid"
        sd_config.model.params.unet_config.params.in_channels = 9

@AUTOMATIC1111
Copy link
Owner

Since you researched it, do you mind writing a paragraph or so about what it does differenty, apart from using a new model?

@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 19, 2022

I haven't researched this model very long. As far as I can see, it adds 5(1+4) new input channels for inpainting and finetunes for that.

Personally, I think it's a big improvement for outpainting, at least.

Oh, do you mean the code?
Not much, the star of the show is the model. The code is almost entirely enablement code.

@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 19, 2022

Here's a outpainting result (poor man's outpainting, 100 steps)
00002-510256737-tortoise
00045-991326057-tortoise relaxing in a beautiful forest, natural lighting

It can even outpaint twice without breaking down, something I've never been able to do with raw SD.
00050-4020233048-tortoise relaxing in a beautiful forest, natural lighting

@random-thoughtss
Copy link
Contributor Author

I should have probably mentioned that the original config for the in-painting model was not released alongside the checkpoint but can be found here.
https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml

This config works with the current repo, with the additional use_ema: False.

sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
sd_config.model.params.use_ema = False
sd_config.model.params.conditioning_key = "hybrid"
sd_config.model.params.unet_config.params.in_channels = 9

These manual changes by @C43H66N12O12S2 replicate all of the changes RunwayML made do their config. Would it be better to

  1. hard-code these changes in the monkey patch?
  2. Provide instructions on how to change the RunwayML config?
  3. Force just use_ema and let the user figure out the config?

Alexander Shmakov added 2 commits October 19, 2022 11:31
@C43H66N12O12S2
Copy link
Collaborator

C43H66N12O12S2 commented Oct 19, 2022

Just a sidenote, reload_model_weights needs to be modified as well, or switching won't work if the initial model is a "normal" model. The easiest - if not elegant - way to achieve that would be if sd_model.sd_checkpoint_info.config != checkpoint_info.config or checkpoint_info.filename.endswith("inpainting.ckpt"):

Actually, the reverse will fail as well (switching from runway to any other model with 4 channels)

Also, we should add credit to the RunwayML repo in sd_hijack_loading.py

Aside from those minor adjustments, this PR is close to ready. Just need to support vanilla samplers.

Seems to not work with txt2img hires fix, but that’s not the usecase for this model anyways.

@nagolinc
Copy link

Hmm... if I checkout

c6f4a873d7c8a916814e3201044b84b72e09769a

and save https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml (with additional use_ema:false parameter)

as {models}/sd-v1-5-inpainting.yaml

I get the error

return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead

Were there other changes needed to get this working?

@acheong08
Copy link

Why was this closed? Is there another version in the works?

@Doppeey
Copy link

Doppeey commented Oct 20, 2022

bump, we need this outpainting quality, it's crazy good

@ZeroCool22
Copy link

Screenshot_7

@nicolasnoble
Copy link

Why was this closed? Is there another version in the works?

Because the merge was totally botched. This needs a deep cleanup.

@nicolasnoble
Copy link

Follow #3192 for the proper PR.

@random-thoughtss
Copy link
Contributor Author

Yup, this repo got messed up. The new PR continues the work.

@AUTOMATIC1111 Github support says they can remove the dead commits from the pr and keep the discussion if you permit it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.