-
-
Notifications
You must be signed in to change notification settings - Fork 881
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Scene-to-Image Prompt Layering System (#1179)
# Summary of the change - new Scene-to-Image tab - new scn2img function - functions for loading and running monocular_depth_estimation with tensorflow # Description (relevant motivation, which issue is fixed) Related to discussion #925 > Would it be possible to have a layers system where we could do have foreground, mid, and background objects which relate to one another and share the style? So we could say generate a landscape, one another layer generate a castle, and on another layer generate a crowd of people. To make this work I made a prompt-based layering system in a new "Scene-to-Image" tab. You write a a multi-line prompt that looks like markdown, where each section declares one layer. It is hierarchical, so each layer can have their own child layers. Examples: https://imgur.com/a/eUxd5qn ![](https://i.imgur.com/L61w00Q.png) In the frontend you can find a brief documentation for the syntax, examples and reference for the various arguments. Here a short summary: Sections with "prompt" and child layers are img2img, without child layers they are txt2img. Without "prompt" they are just images, useful for mask selection, image composition, etc. Images can be initialized with "color", resized with "resize" and their position specified with "pos". Rotation and rotation center are "rotation" and "center". Mask can automatically be selected by color or by estimated depth based on https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter. ![](https://i.imgur.com/8rMHWmZ.png) # Additional dependencies that are required for this change For mask selection by monocular depth estimation tensorflow is required and the model must be cloned to ./src/monocular_depth_estimation/ Changes in environment.yaml: - einops>=0.3.0 - tensorflow>=2.10.0 Einops must be allowed to be newer for tensorflow to work. # Checklist: - [x] I have changed the base branch to `dev` - [x] I have performed a self-review of my own code - [x] I have commented my code in hard-to-understand areas - [x] I have made corresponding changes to the documentation Co-authored-by: hlky <106811348+hlky@users.noreply.github.com>
- Loading branch information
Showing
10 changed files
with
2,100 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
// blend it together and finish it with details | ||
prompt: cute happy orange cat sitting at beach, beach in background, trending on artstation:1 cute happy cat:1 | ||
sampler_name:k_euler_a | ||
ddim_steps: 35 | ||
denoising_strength: 0.55 | ||
variation: 3 | ||
initial_seed: 1 | ||
|
||
# put foreground onto background | ||
size: 512, 512 | ||
color: 0,0,0 | ||
|
||
## create foreground | ||
size:512,512 | ||
color:0,0,0,0 | ||
resize: 300, 300 | ||
pos: 256, 350 | ||
|
||
// select mask by probing some pixels from the image | ||
mask_by_color_at: 15, 15, 15, 256, 85, 465, 100, 480 | ||
mask_by_color_threshold:80 | ||
mask_by_color_space: HLS | ||
|
||
// some pixels inside the cat may be selected, remove them with mask_open | ||
mask_open: 15 | ||
|
||
// there is still some background pixels left at the edge between cat and background | ||
// grow the mask to get them as well | ||
mask_grow: 15 | ||
|
||
// we want to remove whatever is masked: | ||
mask_invert: True | ||
|
||
#### | ||
prompt: cute happy orange cat, white background | ||
ddim_steps: 25 | ||
variation: 1 | ||
|
||
## create background | ||
prompt:beach landscape, beach with ocean in background, photographic, beautiful:1 red:-0.4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
initial_seed: 2 | ||
|
||
// select background and img2img over it | ||
mask_by_color_at: 64, 64 | ||
mask_invert: True | ||
|
||
prompt: corgi | ||
ddim_steps: 50 | ||
seed: 242886303 | ||
|
||
mask_mode: 0 | ||
denoising_strength: 0.8 | ||
//cfg_scale: 15 | ||
mask_restore: True | ||
image_editor_mode:Mask | ||
|
||
# estimate depth and transform the corgi in 3d | ||
transform3d: True | ||
transform3d_depth_near: 0.5 | ||
transform3d_depth_scale: 10 | ||
transform3d_from_hfov: 45 | ||
transform3d_to_hfov: 45 | ||
transform3d_from_pose: 0,0,0, 0,0,0 | ||
transform3d_to_pose: 0.5,0,0, 0,-5,0 | ||
transform3d_min_mask: 0 | ||
transform3d_max_mask: 255 | ||
transform3d_inpaint_radius: 1 | ||
transform3d_inpaint_method: 0 | ||
|
||
## put foreground onto background | ||
size: 512, 512 | ||
|
||
|
||
### create foreground | ||
size: 512, 512 | ||
|
||
mask_depth: True | ||
mask_depth_model: 1 | ||
mask_depth_min: -0.05 | ||
mask_depth_max: 0.5 | ||
mask_depth_invert:False | ||
|
||
#### | ||
prompt: corgi | ||
ddim_steps: 25 | ||
seed: 242886303 | ||
|
||
### background | ||
size: 512,512 | ||
color: #9F978D |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
// blend it together and finish it with some details | ||
prompt: cute corgi at beach, trending on artstation | ||
ddim_steps: 50 | ||
denoising_strength: 0.5 | ||
initial_seed: 2 | ||
|
||
# put foreground onto background | ||
size: 512, 512 | ||
|
||
## create foreground | ||
size: 512, 512 | ||
|
||
// estimate depth from image and select mask by depth | ||
// https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter | ||
mask_depth: True | ||
mask_depth_min: -0.05 | ||
mask_depth_max: 0.4 | ||
mask_depth_invert:False | ||
|
||
### | ||
prompt: corgi | ||
ddim_steps: 25 | ||
|
||
## create background | ||
prompt:beach landscape, beach with ocean in background, photographic, beautiful:1 red:-0.4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
// give it some polish and details | ||
size: 512, 512 | ||
prompt: cute corgi at beach, intricate details, photorealistic, trending on artstation | ||
variation: 0 | ||
seed: 1360051694 | ||
initial_seed: 5 | ||
|
||
# blend it together | ||
prompt: beautiful corgi:1.5 cute corgi at beach, trending on artstation:1 photorealistic:1.5 | ||
ddim_steps: 50 | ||
denoising_strength: 0.5 | ||
variation: 0 | ||
|
||
## put foreground in front of background | ||
size: 512, 512 | ||
|
||
### select foreground | ||
size: 512, 512 | ||
|
||
// estimate depth from image and select mask by depth | ||
// https://huggingface.co/spaces/atsantiago/Monocular_Depth_Filter | ||
mask_depth: True | ||
mask_depth_min: -0.05 | ||
mask_depth_max: 0.4 | ||
mask_depth_invert:False | ||
|
||
#### create foreground | ||
prompt: corgi | ||
ddim_steps: 25 | ||
seed: 242886303 | ||
|
||
### create background | ||
prompt:beach landscape, beach with ocean in background, photographic, beautiful:1 red:-0.4 | ||
variation: 3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
size: 512,512 | ||
mask_blur: 6 | ||
|
||
prompt: fantasy landscape with castle and forest and waterfalls, trending on artstation | ||
denoising_strength: 0.6 | ||
seed: 1 | ||
image_editor_mode: Mask | ||
mask_mode: 0 | ||
mask_restore: True | ||
|
||
# mask the left which contains artifacts | ||
color: 255,255,255,0 | ||
blend:multiply | ||
size: 100,512 | ||
pos: 50,256 | ||
|
||
# mask the top-left which contains lots of artifacts | ||
color: 255,255,255,0 | ||
blend:multiply | ||
size: 280,128 | ||
pos: 128,64 | ||
|
||
# go forward and turn head left to look at the left waterfalls | ||
transform3d: True | ||
transform3d_depth_scale: 10000 | ||
transform3d_from_hfov: 60 | ||
transform3d_to_hfov: 60 | ||
transform3d_from_pose: 0,0,0, 0,0,0 | ||
transform3d_to_pose: 4000,0,2000, 0,-50,0 | ||
transform3d_min_mask: 0 | ||
transform3d_max_mask: 255 | ||
transform3d_inpaint_radius: 5 | ||
transform3d_inpaint_method: 1 | ||
|
||
## | ||
prompt: fantasy landscape with castle and forest and waterfalls, trending on artstation | ||
seed: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.