Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scene-to-Image Prompt Layering System #1179

Merged
merged 60 commits into from
Oct 2, 2022

Conversation

xaedes
Copy link
Contributor

@xaedes xaedes commented Sep 16, 2022

Summary of the change

  • new Scene-to-Image tab
  • new scn2img function
  • functions for loading and running monocular_depth_estimation with tensorflow

Description

(relevant motivation, which issue is fixed)

Related to discussion #925

Would it be possible to have a layers system where we could do have foreground, mid, and background objects which relate to one another and share the style? So we could say generate a landscape, one another layer generate a castle, and on another layer generate a crowd of people.

To make this work I made a prompt-based layering system in a new "Scene-to-Image" tab.
You write a a multi-line prompt that looks like markdown, where each section declares one layer.
It is hierarchical, so each layer can have their own child layers.

Examples: https://imgur.com/a/eUxd5qn

In the frontend you can find a brief documentation for the syntax, examples and reference for the various arguments.

Here a short summary:

Sections with "prompt" and child layers are img2img, without child layers they are txt2img.
Without "prompt" they are just images, useful for mask selection, image composition, etc.
Images can be initialized with "color", resized with "resize" and their position specified with "pos".
Rotation and rotation center are "rotation" and "center".

Mask can automatically be selected by color or by estimated depth based on https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter.

Additional dependencies that are required for this change

For mask selection by monocular depth estimation tensorflow is required and the model must be cloned to ./src/monocular_depth_estimation/
Changes in environment.yaml:

  • einops>=0.3.0
  • tensorflow>=2.10.0

Einops must be allowed to be newer for tensorflow to work.

Checklist:

  • I have changed the base branch to dev
  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation

…ing Sygil-Dev#947.

Also fixes bug in image display when using masked image restoration with RealESRGAN.

When the image is upscaled using RealESRGAN the image restoration can not use the
original image because it has wrong resolution. In this case the image restoration
will restore the non-regenerated parts of the image with an RealESRGAN upscaled
version of the original input image.

Modifications from GFPGAN or color correction in (un)masked parts are also restored
to the original image by mask blending.
Scene is going to be described by a prompt scene description language.
Prompts and image generation parameters can be specified in code.
Scene is based on image layers that are blended to a final image.
Layers can be recursive.
Layer content can be generated from txt2img prompts, or other user content.
Layer blending can be common blending methods (like additive,
multiplicative, etc.) and (optionally) together with img2img.
Other user content could be retrieved from urls or we could provide a few
code accessible gradio image slots where use can upload and manipulate images.
this allows deep trees while avoiding growing number of '#' by just using '#> example title' headings.
sections after this will have their depth bumped by number matched '>'.
… size

it uses default size from img2img_defaults
now allows the common pattern where you can toggle which of two sections should be out-commented.

Example "Section A" is active, "Section B" is in comment:
//*

Section A

/*/

Section B

//*/

Example "Section B" is active, "Section A" is in comment:
/*

Section A

/*/

Section B

//*/
cache is stored in ram not in vram.
it will get automatically cleared when generating with different seed in frontend.
this allows automatically mask objects, foreground or background in generated images.

depth image is estimated from images using monocular-depth-estimation:

https://huggingface.co/keras-io/monocular-depth-estimation
https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter
headings in format "#< title" will reduce following section depth by number of "<"
useful to get out of high depth bump due to usage of "#> title".

bug fix in scn2img function argument definitions
- fix missing intermediates output
- variation=0 will not modify seed
- improved caching by using hash function with explicit contents instead of relying on str(obj) as hash
- move function kwarg preparation out of render functions into seperate functions
`None` attribute key is excluded from hash because it only contains free text (like newlines) of the corresponding scn2img-prompt section
seed generators can be initialized with "initial_seed" and will be used for this section and its children.
they can recursively used, i.e. children can do the same and the seed generator will only apply to them
and their children.
the first seed generator is initialized with the seed argument supplied to scn2img function.

this makes it possible to save initial seeds in scn2img-prompt and reuse prompts with (nearly) deterministic
results.
… mask value

mask generation will now also work directly on txt2img and img2img output (previously only in image sections).
new options in sections:

- mask_invert
- mask_value
- mask_by_color  # reference color as RGB tuple. sets mask to 0 where image color is too different
- mask_by_color_space  # specifies in which colorspace to use for selection (defaults to LAB)
- mask_by_color_threshold # threshold to use for color selection (defaults to 15)
…morphological operations

new arguments:

- mask_by_color_at
- mask_open
- mask_close
- mask_grow
- mask_shrink
- mask_blur
tensorflow is for depth from image estimation, used for automatic mask selection in scn2img.
tensorflow needs einops>=0.3.0 otherwise, "AttributeError: module 'keras.backend' has no attribute 'is_tensor'" is thrown when loading keras model.
now uses the same code for parameter copy as txt2img and scn2img.
scn2img needs some functions,variables,etc. from webui.py.
to avoid circular import, the scn2img function is wrapped
with get_scn2img. it takes the necessary variables as arguments
and returns a usable scn2img function.

monocular depth estimation model loading is now delayed until it
is actually used by used scn2img.
Gives better results for depth most of the time, so it is used by default.

adds new dependency "timm" of (Unofficial) PyTorch Image Models, which is used by MiDaS

add new scn2img arguments:
- mask_depth_normalize: bool // normalizes depth between 0 and 1 (by default on)
- mask_is_depth : bool       // directly use the depth as mask, rescaling depth at mask_depth_min to mask=0 and mask_depth_max to mask=255
- mask_depth_model: int      // 0=monocular depth estimation, 1=MiDaS (default)
image depth is estimated, which gives the distance of each image pixel in 3d space.
the transformed image is taken from another camera position and orientation.

new scn2img arguments to transform output of image, txt2img and img2img:

Enable it with
- transform3d : bool

Only draw pixels where mask is in these bounds:
- transform3d_min_mask: int
- transform3d_max_mask: int

Select depth estimation model and scale
- transform3d_depth_model: int    // 0 = monocular depth estimation, 1 = MiDaS depth estimation (default)
- transform3d_depth_scale: float  // camera position must reflect this scale

Camera parameters of from where input image was taken
- transform3d_from_hfov: float     // horizontal field of view in degree, default 45
- transform3d_from_pose: int_tuple // x,y,z,rotatex,rotatey,rotatez in degree; default 0,0,0 , 0,0,0

Camera parameters to where we want to transform the image
- transform3d_to_hfov: float
- transform3d_to_pose: tuple

Invert mask of unknown pixels due to 3d transformation
- transform3d_mask_invert: bool // default is False

Enable inpainting with cv2.inpaint, by default it is enabled.
- transform3d_inpaint: bool
- transform3d_inpaint_radius: int
- transform3d_inpaint_method: int // 0=cv2.INPAINT_TELEA, 1=cv2.INPAINT_NS
- transform3d_inpaint_restore_mask: bool // restore mask of unknown pixels due to transformation after inpainting
new scn2img argument: `blend`

`blend` is one of alpha, mask, add, add_modulo, darker, difference, lighter, logical_and, logical_or, logical_xor, multiply, soft_light, hard_light, overlay, screen, subtract, subtract_modulo

alpha & mask are both the default alpha compositing

the other ones are blend operations from PIL.ImageChops
@xaedes
Copy link
Contributor Author

xaedes commented Sep 26, 2022

@codedealer I added output info and batch view progress.

@hlky I moved the code, pinned the dependency versions and reverted changes to not scn2img-related code (save_sample). For the remaining changes, i.e. the added default values for img2img and txt2img, I think my #1179 (comment) explains the reasoning.

@Dekker3D
I also added MiDaS based depth estimation and 3D image transformation. Not perfect, but actually usable.

Images can now be blended in other modes (full list) like multiply or screen. I used it mainly for better mask selection.

Screenshot

@xaedes xaedes requested review from codedealer and hlky and removed request for codedealer September 26, 2022 01:31
add new scn2img argument:

- transform3d_depth_near: float // specifies how near the closest pixel is, i.e. the distance corresponding to depth=0

without this pixels with very low depth previously where strongly distorted after transformation.
Copy link
Collaborator

@codedealer codedealer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple points:

  1. @hlky this currently pulls pickles from github for MiDaS into pytorch's cache. Should we pack them inside repo instead?
  2. @xaedes During intermediate renders you call webui's save_sample with skip_metadata set to False hardcoded. In your example with the waterfall scene when saving empty masks this results in a warning thrown in console:

Couldn't find metadata on image

It's not critical but can we do something about it?

Breannajaeh referenced this pull request Oct 2, 2022
* added front matter
* fixed links
* logo is removed until further consideration
@hlky
Copy link
Contributor

hlky commented Oct 2, 2022

Note the approved files. I haven't tested this yet, but I have resolved the conflicts in requirements.txt and environment.yaml, there shouldn't be any issues with the accompanying data files.

@codedealer @xaedes what is the status of this?

@xaedes
Copy link
Contributor Author

xaedes commented Oct 2, 2022

@hlky Tested it and it does work with the updated environments.yaml. requirements.txt also looks good, but at my machine I can't test if it works with Docker.

@codedealer I will do something about the image metadata.

@hlky hlky merged commit 33b896d into Sygil-Dev:dev Oct 2, 2022
xaedes added a commit to xaedes/stable-diffusion-webui that referenced this pull request Oct 2, 2022
Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding to the intermediate image
- "seed" contains the seed at the start of the function that generated this intermediate image
- "width" and "height" contain the size of the image.

To get the seed at the start of the render function without using it, a class SeedGenerator is added and
used instead of the python generator functions.

Fixes warning thrown in console: "> Couldn't find metadata on image", originally reported by @codedealer in Sygil-Dev#1179 (review)
xaedes added a commit to xaedes/stable-diffusion-webui that referenced this pull request Oct 2, 2022
Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding to the intermediate image
- "seed" contains the seed at the start of the function that generated this intermediate image
- "width" and "height" contain the size of the image.

To get the seed at the start of the render function without using it, a class SeedGenerator is added and
used instead of the python generator functions.

Fixes warning thrown in console: "> Couldn't find metadata on image", originally reported by @codedealer in Sygil-Dev#1179 (review)
xaedes added a commit to xaedes/stable-diffusion-webui that referenced this pull request Oct 2, 2022
Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding to the intermediate image
- "seed" contains the seed at the start of the function that generated this intermediate image
- "width" and "height" contain the size of the image.

To get the seed at the start of the render function without using it, a class SeedGenerator is added and
used instead of the python generator functions.

Fixes warning thrown in console: "> Couldn't find metadata on image", originally reported by @codedealer in Sygil-Dev#1179 (review)
xaedes added a commit to xaedes/stable-diffusion-webui that referenced this pull request Oct 2, 2022
Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding to the intermediate image
- "seed" contains the seed at the start of the function that generated this intermediate image
- "width" and "height" contain the size of the image.

To get the seed at the start of the render function without using it, a class SeedGenerator is added and
used instead of the python generator functions.

Fixes warning thrown in console: "> Couldn't find metadata on image", originally reported by @codedealer in Sygil-Dev#1179 (review)
hlky pushed a commit that referenced this pull request Oct 2, 2022
# Description

Intermediate image saving in scn2img tries to save metadata which is not
set. This results in warning thrown in console: "Couldn't find metadata
on image", originally reported by @codedealer in
#1179 (review)

Metadata for intermediate images is added to fix the warning.

Following metadata is written:
- "prompt" contains the representation of the SceneObject corresponding
to the intermediate image
- "seed" contains the seed at the start of the function that generated
this intermediate image
- "width" and "height" contain the size of the image.

To get the seed at the start of the render function without using it, a
class SeedGenerator is added and used instead of the python generator
functions.

Fixes warning thrown in console: "> Couldn't find metadata on image",
originally reported by @codedealer in
#1179 (review)

# Checklist:

- [x] I have changed the base branch to `dev`
- [x] I have performed a self-review of my own code
- [x] I have commented my code in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants