Support for runwayml In-painting SD model. #3140

random-thoughtss · 2022-10-19T10:39:30Z

A simple addition to support the new in-painting model released here:
https://github.com/runwayml/stable-diffusion

We update the stable-diffusion dependency to point to the new repo and pass in the required additional features to the model. It requires an extra masked-image and mask inputs which act as visual conditioning for the model. Setting the mask to be all 1s can also be used for txt2img generation.

Implemented

K-Diffusion txt2img
K-Diffusion img2img
K-Diffusion inpaint

TODO

VanillaStableDiffusionSampler updates
Add a flag to detect if we need to create the masked tensors to save some memory.
Fix use_ema: False config option. Currently need to add use_ema: False in sd-v1-5-inpainting.yaml, otherwise the checkpoint will not load.

edit attention key handler: return early when weight parse returns NaN

The directory for the images saved with the Save button may still not exist, so it needs to be created prior to opening the log.csv file.

Add "Scale to" option to Extras

remake train interface to use tabs

…ebui-plus

Add option to store TI embeddings in png chunks, and load from same.

…ebui-plus

train: make it possible to make text files with prompts train: rework scheduler so that there's less repeating code in textual inversion and hypernets train: move epochs setting to options

deepbooru: added option to quote (\) in tags deepbooru/BLIP: write caption to file instead of image filename deepbooru/BLIP: now possible to use both for captions deepbooru: process is stopped even if an exception occurs

…de_ranks_in_output

random-thoughtss · 2022-10-19T11:44:47Z

Have you tested the vanilla 1.4 model with this PR?

Yes, I observe matching seed parity with the CompVis stable-diffusion repo. The only code path that the visual conditing is used in is the new hybrid conditioning, so it shouldn't effect any crossattn models. Although it might be worth it to only create the masks when they are actually needed.
https://github.com/runwayml/stable-diffusion/blob/main/ldm/models/diffusion/ddpm.py#L1431

If the config .yaml needs to be changed, you can ship a config and use shared.cmd_opts.config to use that new config when loading the Runway model.

Ideally the config should not need to be changed. I Originally misattributed the bug. LatentInpaintDiffusion in the yaml is fine, but the original sd-v1-5-inpainting.yaml is missing use_ema: False. This causes the checkpoint to be loaded incorrectly, effectively not loading the checkpoint at all.

what is that extra masked-image?

It provides the network with contextual information about the original image. Presumably this allows it to better fine-tune the in-painting, creating a more coherent image.

C43H66N12O12S2 · 2022-10-19T12:20:50Z

@random-thoughtss You can do sd_config.model.params.use_ema = False in sd_models.py after OmegaConf.load

cornpo · 2022-10-19T12:34:59Z

I'm in randomthoughtss branch, monkeypatched sd_config.model.params.use_ema = False > sd_models.py and 1.4 loads now, size mismatch persists for "1.5" inpaint.

caveat; torch1.12.1+rocm5.1, bur it usually doesn't matter.

File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/routes.py", line 275, in run_predict
    output = await app.blocks.process_api(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 787, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 694, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1633, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1488, in run_settings_single
    opts.data_labels[key].onchange()
  File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 40, in f
    res = func(*args, **kwargs)
  File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 85, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(shared.sd_model)))
  File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 252, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 169, in load_model_weights
    missing, extra = model.load_state_dict(sd, strict=False)
  File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
	size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

C43H66N12O12S2 · 2022-10-19T12:40:54Z

That’s most likely due to our repo using the CompVis config. Try also adding:
sd_config.model.params.conditioning_key = hybrid

C43H66N12O12S2 · 2022-10-19T12:42:49Z

I think this model could also be used for outpainting with great effect.

cornpo · 2022-10-19T12:53:09Z

   sd_config = OmegaConf.load(checkpoint_info.config)
###monkey    
    sd_config.model.params.use_ema = False
    sd_config.model.params.conditioning_key = hybrid
###
    sd_model = instantiate_from_config(sd_config.model)

Vanilla python webui.py

Traceback (most recent call last): File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 161, in <module> webui(cmd_opts.api) File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 122, in webui initialize() File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 84, in initialize shared.sd_model = modules.sd_models.load_model() File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 215, in load_model sd_config.model.params.conditioning_key = hybrid NameError: name 'hybrid' is not defined

C43H66N12O12S2 · 2022-10-19T12:53:50Z

Change hybrid to "hybrid"

cornpo · 2022-10-19T13:00:15Z

size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

I give up for now. Non-programmer trashing up the collabo isn't going to do any good.

C43H66N12O12S2 · 2022-10-19T13:03:11Z

Actually, that shouldn't happen. @random-thoughtss When you tested 1.4, did you change the model dimensions to match 1.4 inside the config?

We shouldn't break compatibility with 1.4, as 1.5 (which will release very soon now) uses the same dimensions.

C43H66N12O12S2 · 2022-10-19T13:38:19Z

@AUTOMATIC1111 Curious to hear your thoughts on this model.

My thinking is like this:
Load the normal model at all times (whether that's vanilla 1.4, 1.5, WD or whatever)
Add a checkbox to outpainting & inpainting
If the user checks this checkbox, load the RunwayML model, run inference, unload (maybe dependent on a user setting).

C43H66N12O12S2 · 2022-10-19T16:16:17Z

    sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
    sd_config.model.params.use_ema = False
    sd_config.model.params.conditioning_key = "hybrid"
    sd_config.model.params.unet_config.params.in_channels = 9

This is all that's needed to load it as-is. I've had better results outpainting with this model than inpainting but probably a skill issue. (hilariously poor man's outpainting seems to work better than mk2 with this model)

We also don't need to switch to the RunwayML repo for this. We can continue our proud tradition of hijacking the CompVis repo. I wrote some working code performing just that.

AUTOMATIC1111 · 2022-10-19T17:09:23Z

oxy: switching to different repo is a big step, I need to grab his branch and check if it really is a lot better, then there can be some considerations.

AUTOMATIC1111 · 2022-10-19T17:09:58Z

also is the sd 1.5 the finetuned 1.5 model that emad keeps from being released?

C43H66N12O12S2 · 2022-10-19T17:12:02Z

We don’t need to switch repos. I wrote working hijacking code for this.

1.5 is (much like 1.4) just 1.2 but further along training.

1.4 is resumed from 1.2 and trained for ~270k steps I think, and 1.5 ~600k

that emad keeps from being released?

yes
@AUTOMATIC1111

+modules/sd_hijack_loading.py
import math
import os
import sys
import traceback
import torch
import numpy as np
from einops import rearrange
from omegaconf import ListConfig
from modules import shared

import ldm.models.diffusion.ddpm
from ldm.models.diffusion.ddpm import LatentDiffusion


@torch.no_grad()
def get_unconditional_conditioning(self, batch_size, null_label=None):
    if null_label is not None:
        xc = null_label
        if isinstance(xc, ListConfig):
            xc = list(xc)
        if isinstance(xc, dict) or isinstance(xc, list):
            c = self.get_learned_conditioning(xc)
        else:
            if hasattr(xc, "to"):
                xc = xc.to(self.device)
            c = self.get_learned_conditioning(xc)
    else:
        # todo: get null label from cond_stage_model
        raise NotImplementedError()
    c = repeat(c, "1 ... -> b ...", b=batch_size).to(self.device)
    return c

class LatentInpaintDiffusion(LatentDiffusion):
    def __init__(
        self,
        concat_keys=("mask", "masked_image"),
        masked_image_key="masked_image",
        *args,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.masked_image_key = masked_image_key
        assert self.masked_image_key in concat_keys
        self.concat_keys = concat_keys


    @torch.no_grad()
    def get_input(
        self, batch, k, cond_key=None, bs=None, return_first_stage_outputs=False
    ):
        # note: restricted to non-trainable encoders currently
        assert (
            not self.cond_stage_trainable
        ), "trainable cond stages not yet supported for inpainting"
        z, c, x, xrec, xc = super().get_input(
            batch,
            self.first_stage_key,
            return_first_stage_outputs=True,
            force_c_encode=True,
            return_original_cond=True,
            bs=bs,
        )

        assert exists(self.concat_keys)
        c_cat = list()
        for ck in self.concat_keys:
            cc = (
                rearrange(batch[ck], "b h w c -> b c h w")
                .to(memory_format=torch.contiguous_format)
                .float()
            )
            if bs is not None:
                cc = cc[:bs]
                cc = cc.to(self.device)
            bchw = z.shape
            if ck != self.masked_image_key:
                cc = torch.nn.functional.interpolate(cc, size=bchw[-2:])
            else:
                cc = self.get_first_stage_encoding(self.encode_first_stage(cc))
            c_cat.append(cc)
        c_cat = torch.cat(c_cat, dim=1)
        all_conds = {"c_concat": [c_cat], "c_crossattn": [c]}
        if return_first_stage_outputs:
            return z, all_conds, x, xrec, xc
        return z, all_conds

def do_hijack():
    ldm.models.diffusion.ddpm.get_unconditional_conditioning = get_unconditional_conditioning
    ldm.models.diffusion.ddpm.LatentInpaintDiffusion = LatentInpaintDiffusion

sd_models.py
from modules.sd_hijack_loading import do_hijack
in load_model

    if str(checkpoint_info.filename).endswith("inpainting.ckpt"):
        do_hijack()
        sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
        sd_config.model.params.use_ema = False
        sd_config.model.params.conditioning_key = "hybrid"
        sd_config.model.params.unet_config.params.in_channels = 9

AUTOMATIC1111 · 2022-10-19T17:16:56Z

Since you researched it, do you mind writing a paragraph or so about what it does differenty, apart from using a new model?

C43H66N12O12S2 · 2022-10-19T17:18:46Z

I haven't researched this model very long. As far as I can see, it adds 5(1+4) new input channels for inpainting and finetunes for that.

Personally, I think it's a big improvement for outpainting, at least.

Oh, do you mean the code?
Not much, the star of the show is the model. The code is almost entirely enablement code.

C43H66N12O12S2 · 2022-10-19T17:34:28Z

Here's a outpainting result (poor man's outpainting, 100 steps)

It can even outpaint twice without breaking down, something I've never been able to do with raw SD.

random-thoughtss · 2022-10-19T18:24:43Z

I should have probably mentioned that the original config for the in-painting model was not released alongside the checkpoint but can be found here.
https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml

This config works with the current repo, with the additional use_ema: False.

sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
sd_config.model.params.use_ema = False
sd_config.model.params.conditioning_key = "hybrid"
sd_config.model.params.unet_config.params.in_channels = 9

These manual changes by @C43H66N12O12S2 replicate all of the changes RunwayML made do their config. Would it be better to

hard-code these changes in the monkey patch?
Provide instructions on how to change the RunwayML config?
Force just use_ema and let the user figure out the config?

This reverts commit c6f4a873d7c8a916814e3201044b84b72e09769a.

… switching repo.

C43H66N12O12S2 · 2022-10-19T18:59:43Z

Just a sidenote, reload_model_weights needs to be modified as well, or switching won't work if the initial model is a "normal" model. The easiest - if not elegant - way to achieve that would be if sd_model.sd_checkpoint_info.config != checkpoint_info.config or checkpoint_info.filename.endswith("inpainting.ckpt"):

Actually, the reverse will fail as well (switching from runway to any other model with 4 channels)

Also, we should add credit to the RunwayML repo in sd_hijack_loading.py

Aside from those minor adjustments, this PR is close to ready. Just need to support vanilla samplers.

Seems to not work with txt2img hires fix, but that’s not the usecase for this model anyways.

nagolinc · 2022-10-19T20:25:00Z

Hmm... if I checkout

c6f4a873d7c8a916814e3201044b84b72e09769a

and save https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml (with additional use_ema:false parameter)

as {models}/sd-v1-5-inpainting.yaml

I get the error

return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead

Were there other changes needed to get this working?

acheong08 · 2022-10-20T11:02:49Z

Why was this closed? Is there another version in the works?

Doppeey · 2022-10-20T11:41:56Z

bump, we need this outpainting quality, it's crazy good

ZeroCool22 · 2022-10-20T17:48:18Z

nicolasnoble · 2022-10-20T21:19:08Z

Why was this closed? Is there another version in the works?

Because the merge was totally botched. This needs a deep cleanup.

nicolasnoble · 2022-10-20T21:20:23Z

Follow #3192 for the proper PR.

random-thoughtss · 2022-10-20T21:36:58Z

Yup, this repo got messed up. The new PR continues the work.

@AUTOMATIC1111 Github support says they can remove the dead commits from the pr and keep the discussion if you permit it.

AUTOMATIC1111 and others added 30 commits October 12, 2022 09:00

prevent SD model from loading when running in deepdanbooru process

b36a6a6

Merge pull request #2312 from AUTOMATIC1111/edit-attention-nan-fix

925eb65

edit attention key handler: return early when weight parse returns NaN

update environment-wsl2.yaml

c839bfc

emergency fix

bbaccf1

Ensure the directory exists before saving to it

c0982be

The directory for the images saved with the Save button may still not exist, so it needs to be created prior to opening the log.csv file.

[img2imgalt] Fix seed & Allow batch.

3fcf7f6

Typo fix in watermark hint.

d9ed34f

Merge branch 'master' into feature/scale_to

741e520

Merge pull request #2110 from JustMaier/feature/scale_to

b6b3a63

Add "Scale to" option to Extras

xy_grid: Find hypernetwork by closest name

4383a34

xy_grid: Confirm that hypernetwork options are valid before starting

d5e7357

xy_grid: Refactor confirm functions

905d845

change textual inversion tab to train

504fd6e

remake train interface to use tabs

Account when lines are mismatched

98ed4c1

Remove duplicate artist from file

a4b6993

fix iterator bug for #2295

058a963

formatting

7d3cc46

formatting

e1ad684

images history improvement

b2e200b

Merge branch 'master' of https://github.com/yfszzx/stable-diffusion-w…

5e9a0d8

…ebui-plus

Merge pull request #2037 from AUTOMATIC1111/embed-embeddings-in-images

95408bd

Add option to store TI embeddings in png chunks, and load from same.

test

10ea4ce

images history improvement

cda35a9

images history improvement

46c8394

Merge branch 'AUTOMATIC1111:master' into master

7f8fe74

images history improvement

97bd3c4

Merge branch 'master' of https://github.com/yfszzx/stable-diffusion-w…

eb37db3

…ebui-plus

train: change filename processing to be more simple and configurable

c871e92

train: make it possible to make text files with prompts train: rework scheduler so that there's less repeating code in textual inversion and hypernets train: move epochs setting to options

deepbooru: added option to use spaces or underscores

02df10e

deepbooru: added option to quote (\) in tags deepbooru/BLIP: write caption to file instead of image filename deepbooru/BLIP: now possible to use both for captions deepbooru: process is stopped even if an exception occurs

Merge remote-tracking branch 'upstream/master' into interrogate_inclu…

78166d6

…de_ranks_in_output

Alexander Shmakov added 2 commits October 19, 2022 11:31

Revert "Updated stable-diffusion git to new runwayml repo"

da7ada5

This reverts commit c6f4a873d7c8a916814e3201044b84b72e09769a.

Slight code cleanups and monkey-patch necessary code changes to avoid…

0034541

… switching repo.

Added hijack code for DDIMSampler

20d9745

random-thoughtss closed this Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for runwayml In-painting SD model. #3140

Support for runwayml In-painting SD model. #3140

random-thoughtss commented Oct 19, 2022 •

edited

Loading

random-thoughtss commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

AUTOMATIC1111 commented Oct 19, 2022

AUTOMATIC1111 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

AUTOMATIC1111 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

random-thoughtss commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

nagolinc commented Oct 19, 2022

acheong08 commented Oct 20, 2022

Doppeey commented Oct 20, 2022

ZeroCool22 commented Oct 20, 2022

nicolasnoble commented Oct 20, 2022

nicolasnoble commented Oct 20, 2022

random-thoughtss commented Oct 20, 2022

Support for runwayml In-painting SD model. #3140

Support for runwayml In-painting SD model. #3140

Conversation

random-thoughtss commented Oct 19, 2022 • edited Loading

Implemented

TODO

random-thoughtss commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022 • edited Loading

C43H66N12O12S2 commented Oct 19, 2022

cornpo commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 • edited Loading

AUTOMATIC1111 commented Oct 19, 2022

AUTOMATIC1111 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 • edited Loading

AUTOMATIC1111 commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 • edited Loading

C43H66N12O12S2 commented Oct 19, 2022 • edited Loading

random-thoughtss commented Oct 19, 2022

C43H66N12O12S2 commented Oct 19, 2022 • edited Loading

nagolinc commented Oct 19, 2022

acheong08 commented Oct 20, 2022

Doppeey commented Oct 20, 2022

ZeroCool22 commented Oct 20, 2022

nicolasnoble commented Oct 20, 2022

nicolasnoble commented Oct 20, 2022

random-thoughtss commented Oct 20, 2022

random-thoughtss commented Oct 19, 2022 •

edited

Loading

cornpo commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 19, 2022 •

edited

Loading