Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft Inpainting #14208

Merged
merged 36 commits into from
Dec 14, 2023
Merged

Soft Inpainting #14208

merged 36 commits into from
Dec 14, 2023

Conversation

CodeHatchling
Copy link
Contributor

@CodeHatchling CodeHatchling commented Dec 5, 2023

Description

Soft inpainting allows the denoiser to work directly with soft-edged (i.e. non-binary) masks, whereby unmasked content is blended seamlessly with inpainted content with gradual transitions. It is conceptually similar to per-pixel denoising strength.

Code changes

  • Soft inpainting can be enabled in the UI, with parameters that affect how the original and inpainted content are blended.
  • When soft inpainting is enabled, an alternate path through code is taken where latent masks are NOT rounded to 0 or 1.
  • During denoising, the original latent vectors are interpolated with the denoised latent vectors in a way that balances the influence of both according to the mask.
  • The final pixel composite uses the differences of latent vectors between the original and inpainted images to calculate a unique blending mask for each image in the batch. This effectively eliminates ghosting (where objects appear to fade), harsh transition boundaries, and preserves as much of the original content as possible.

Fixes

#14024
Summary:

  • When inpainting, even with a very high mask blur, a seam will appear at the 50% opacity threshold.
  • When inpaint-sketching, with any amount of mask blur, the colors of the sketch will bleed into regions of the image that do not recieve denoising. (Without mask blur the results are full of seams.)
  • Inpaint sketching with 50% mask transparency or more is pointless as nothing is inpainted.
  • It is difficult to inpaint objects with indefinite boundaries like dust clouds, or in any situation where some kind of gradual seamless transition in texture is needed. In these cases, the original texture is destroyed when it should be partially preserved.

Screenshots/videos:

Example Comparison
image image

Checklist:

Notes:

Why not an extension?

Implementing this required integration directly with processing.py and the denoising process. There are no "hooks" for extensions to intervene and modify their behaviour in the way that is needed. For example:

  • A few different places in the code forced the inpainting mask to be binary (0 or 1) with no option to bypass it.
  • Balancing the blend between inpainted and original content required modifying the blending math.
  • The vanilla inpainting had issues that made it difficult to work with.

Implementing this as an extension would require duplicating a large chunk of the code, and would likely provide a suboptimal user experience.

Mild concerns

  • One of the two attributes mask_for_overlay and masks_for_overlay is used depending on whether soft inpainting is used. The intent was to minimize differences in behaviour for extensions already built around the vanilla inpainting. However, this could lead to confusion in the future, and ideally the attributes should be unified, as there was already a high number of attributes containg versions of the mask at different processing stages.
  • I couldn't test all of the different branches processing.py could take. For example, I don't know of a case where samples_ddim would have the condition already_decoded, but I did attempt to handle that case nonetheless.
  • The stage that generates composite masks based on latent vector diffs uses a somewhat expensive CPU-based filter (found in modules/images.py). It can potentially be implemented in Torch and run on the GPU.

Planned work

  • I intend on adding parameters to control the mask generation parameters used for the composite stage (where we are working with pixels instead of latent vectors). Specifically, sensitivity to latent vector diffs, and the option to scale this sensitivity based on the original mask.
  • I am considering using something akin to Poisson blending to blend the original and denoised image content. It could improve the ability to preserve the details of both the original and inpainted content without compromising as much on either.

…ith a naive blend with the original latents.
…tent blending formula that preserves details in blend transition areas.
…ss offset parameter, changed labels in UI and for saving to/pasting data from PNG files.
…isual difference between the original and modified latent images. This should remove ghosting and clipping artifacts from masks, while preserving the details of largely unchanged content.
… "del" references with the intention of minimizing allocations and easing garbage collection.
… "ValueError: Images do not match" *shudder*
…g the feature, and centralizes default values to reduce the amount of copy-pasta.
…ecause of mismatched tensor sizes. The 'already_decoded' decoded case should also be handled correctly (tested indirectly).
@cmp-nct
Copy link

cmp-nct commented Dec 5, 2023

Looks like something that should have been there since the start!
Nice cherry pick on the sample btw :)

Given that change it would also make sense to have a soft brush in the UI now, that might to wonders to blend some inpaint in. Currently it's quite a pain with tons of low intensity steps needed

@CodeHatchling
Copy link
Contributor Author

Nice cherry pick on the sample btw :)

I'll have you know that when I set out to generate these particular examples, they were all first attempts, no cherry picking. Want me to generate a grid? ;;}

@cmp-nct
Copy link

cmp-nct commented Dec 6, 2023

I meant that you can do that kind of masking properly by using a smaller cfg value and more steps, so it gradually inpaints.
I've had runs with almost a thousand sampling steps during inpainting, with your PR it looks like that can be done in a fraction of the time.

Your solution is what I wanted from begin on. Very important for great inpainting :)

@AUTOMATIC1111 AUTOMATIC1111 merged commit 8c32594 into AUTOMATIC1111:dev Dec 14, 2023
3 checks passed
@CodeHatchling
Copy link
Contributor Author

Thank you for merging! ::>

@AUTOMATIC1111
Copy link
Owner

@CodeHatchling are you the sole author of the approach or did you use ideas from someone's paper?

@arafatx
Copy link

arafatx commented Feb 21, 2024

Can anyone summarize this feature for me? Does this mean that with this feature, we no longer need a separate inpainting checkpoint? For example, normally I have two separate models: one is Abcmodel.safetensor as a normal checkpoint, and the other is Abcmodel_inpainting.safetensor as an inpainting checkpoint

  • Arafat

Thank you.

@light-and-ray
Copy link
Contributor

You can make any checkpoint be inpainting in checkpoint merger tab: https://github.com/light-and-ray/sd-webui-replacer?tab=readme-ov-file#how-to-get-an-inpainting-model

This soft inpainitng, as I understand, can avoid sharp contours, and still requires inpainting model

@CodeHatchling
Copy link
Contributor Author

This soft inpainitng...still requires inpainting model

Nope! Works with any model. It kind of acts as an alternative to an inpainting model or a controlnet inpainting module - at least, from what others have told me (I'm not that familiar with those two features).

@CodeHatchling
Copy link
Contributor Author

CodeHatchling commented Feb 22, 2024

Can anyone summarize this feature for me?

It was designed to work with any model. I tested it with the built-in SD 1.5 model that A1111 comes with, and have tried it with a variety of others.

I'm not sure how well it will work with an inpainting model specifically, but the extension also passes in a non-binary mask to them as conditioning.

To use it, just provide a mask. Black (0) pixels will not be changed, white (1) pixels will be processed by the diffuser, and any in-between shades of grey (0 to 1) will define the transition region in which the image is only partially denoised.

It works better at helping the features imagined by the denoiser blend naturally with the unmasked original content.

@LindezaBlue
Copy link

What settings do you use to get this to work? I've tested it with high denoising ranging from 0.5-1 and even tested it with a mask blur rang of 8-64. I am on the inpainting tab and have also tried inpaint sketch. Is there a step I am missing? I've looked all over for more info and haven't found anything that gives clear instructions on it's usage.

Any help is appreciated!~ Thanks! <3

@psykokwak-com
Copy link

Hi. Nice work.
Is there a way to use it through the API ?
Thanks.

@BNP1111
Copy link

BNP1111 commented Feb 26, 2024

i got this error:
D:\stable-diffusion-webui-forge\extensions-builtin\soft-inpainting\scripts\soft_inpainting.py:161: RuntimeWarning: divide by zero encountered in divide
converted_mask = converted_mask / half_weighted_distance

why?

@LindezaBlue
Copy link

i got this error: D:\stable-diffusion-webui-forge\extensions-builtin\soft-inpainting\scripts\soft_inpainting.py:161: RuntimeWarning: divide by zero encountered in divide converted_mask = converted_mask / half_weighted_distance

why?

The error message you're encountering indicates a runtime warning in the file soft_inpainting.py at line 161. Specifically, it's warning about a division by zero encountered in the expression converted_mask / half_weighted_distance.

This warning typically occurs when attempting to divide by zero, which can happen if half_weighted_distance is zero or near zero. This situation might be due to incorrectly setting an input to zero, or if there is not inpaint mask? A screenshot of your settings for inpainting would help answer this quest better.

Anyone else can chime in. If they know what it is.

@Woisek
Copy link

Woisek commented Mar 2, 2024

Out of curiosity: Can this 'soft inpainting' also be used for upscaling to steer the denoising? It would be awesome to have precise control over what and where a change should be made when upscaling.

@LindezaBlue
Copy link

Out of curiosity: Can this 'soft inpainting' also be used for upscaling to steer the denoising? It would be awesome to have precise control over what and where a change should be made when upscaling.

It's not really an "upscaler" but you can increase the resolution of your inpainting by adjusting the Width x Hight settings to 768x768 or higher. But your render time goes up the higher the resolution you set.

image

Hope that helps.

@psykokwak-com
Copy link

Hi all,
Is it possible to use our own progressive mask image instead just binary image ?

@hjj-lmx
Copy link

hjj-lmx commented Mar 12, 2024

Is there a way to use it through the API ?
Thanks.

@psykokwak-com
Copy link

Yes, look here : #15138

@CodeHatchling
Copy link
Contributor Author

CodeHatchling commented Mar 13, 2024

i got this error: D:\stable-diffusion-webui-forge\extensions-builtin\soft-inpainting\scripts\soft_inpainting.py:161: RuntimeWarning: divide by zero encountered in divide converted_mask = converted_mask / half_weighted_distance

why?

Oh, this is likely just due to the difference threshold value being set to 0, I believe. The intended outcome for a threshold of 0 was if a given latent pixel changed at all, the changed latent is included in the resulting composite with full opacity.

@aulerius
Copy link

Hello.
Could this be treated as analogous to something like Differential Diffusion? As seen here.

I too want to use this through an API, which I see from this reply is possible, however, can I supply my own blurry mask? That is, treat the mask as a ready "map" with gray values that I prepare, and skip any additional blurring that the code would usually perform.
Or is this exactly how it would work anyway through API?

Thanks!

@ely1113
Copy link

ely1113 commented Mar 26, 2024

May I ask if it can be used on comfyui?

@pawel665j pawel665j mentioned this pull request Apr 16, 2024
@Zhangyangrui916
Copy link

Current description of mask_blend_power/Schedule bias is "Shifts when preservation of original content occurs during denoising."
Maybe better to describe it as "Shift to more preservation at the start and less preservation at the end of original content during denoising"
I experiment the output of get_modified_nmask during each iteration with different mask_blend_power:
mask_blend_power is 0、0.5、1、2 respectively,
newall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.