Expose scale & shift as options #38

telamon · 2024-09-26T03:01:42Z

edit 2:
In order to focus the model to onto different details.
As an inference person I would like to expose two additional variables to tweak:

# Line ~146:  geowizard/run_infer_v2.py
+    parser.add_argument("--e_scale", type=float, default=1.4, help="Post-processing ensemble scale")
+    parser.add_argument("--e_shift", type=float, default=0.4, help="Post-processing ensemble shift")

original text:

Hello, first of all I'd like to say thank you for this model, It's really fun to use and produces convincing results.

But I have an issue, it seems to me that a lot of detail is being lost/misplaced due to the low resolution of the depth map output (8bits)

So I tried running inference on an image twice.
First pass: the original image
Second pass: a cropped out region with high detail (face) and then stitched both of the depth maps together.

The results were surprisingly refreshing:

left) Single-pass. Most of the foreground detail had been pushed into the upper boundary, and the character appears quite flat.
right) Second pass reveals new geometry, smoother surfaces but also some more noise.

So my first question is, if geowizard/inference is used recursively to refine and inpaint regions with higher detail - what would be the most correct way to merge the two depth maps? my naive approach would be:

# Infer two depth maps and stitch them together

depth1 = pipe(color_image) # pass 1

region = ... # select a region with lacking or incorrect depth detail

d1c = crop(depth1, region) # crop original depth

color_cropped = crop(color_image, region).upscale(1.25)
depth2 = pipe(color_cropped, region) # pass 2

# rescale depth2 values to fit between the original near and far bounds.
output = min(d1c) + depth2 * (max(d1c) - min(d1c)) # <-- is this ok?

Second inquiry; most current depth maps are grayscale "PNG"-files. wouldn't it make sense to break the norm and start using the green and blue channels to store higher resolution depth information?
I guess it's possible to keep the red component as is, but i don't understand why nobody is storing additional fractions in the other two bytes.

# Encode & Decode 24bit depth maps

def encode_depth24(x: float):
  x = int(x * (2**24 - 1)) # scale normalized value to 24bit int
  return (
      (x >> 16) & 0xff, # R = Current 8bit depth
      (x >> 8) & 0xff, # G
      x & 0xff # B
   )

def decode_depth24(c: vec3):
  n = (c.r << 16) | (c.g << 8) | c.b
  return float(n) / (2**24 - 1) # scale 8..24bits back to float

Lastly I know little about training, but I wonder if the model was trained to produce 8bits output,
or if by any chance there's an 8bit bottleneck somewhere in the training pipeline that would prevent
the produced depth maps from being as smooth as it's sibling normals/maps.
output is 32bit

Thanks for hearing me out!

edit: fixed late night code snippets 🧠 🔫

The text was updated successfully, but these errors were encountered:

fuxiao0719 · 2024-09-26T03:55:47Z

Hi, thanks for your interesting insight!

Can you provide the input image as well as the crop code?

I'll respond to this issue within one week after the ICLR submission ddl.

telamon · 2024-09-26T15:04:59Z

The image above is an asset, but I think any high resolution image should do.
(like we use images of 1024x1280 to represent x,y dimensions, but the depth z-axis is limited to 256...)

There's currently no crop-code, I used Gimp and Blender to quickly test the idea before hacking together something like an "InpaintDepth" comfyui-custom-node - I'm still unsure if it's possible to blend the two depth maps together with decent results. That's why I'm asking for feedback.

No stress and good luck at the conference.

fuxiao0719 · 2024-10-05T21:32:49Z

(1) For the first query:
The initial depth map tends to be flat due to the large invalid background area. A better approach might be to crop the image while maintaining a balanced ratio between the foreground and background. Your proposed two-stage depth fusion is intriguing, but it might encounter challenges in selecting the optimal cropping region, as both depth1 and depth2 could still struggle with incorrect scaling. I have tried several cases, the two-stage fusion is quite unstable, and my proposal is more convenient.

(2) For the second query:
Yes, that makes sense! Storing higher precision depth maps could offer significant advantages. It's possible that researchers often opt for 8-bit precision because it's sufficient for most applications, while 16-bit or 32-bit data might introduce additional complexity, making the training process more difficult or less stable.

telamon · 2024-10-10T14:04:40Z

Thank you for the reply.
(1) Yes. I realized that depth fusion is quite complex....
I tried to gain insight into the "flatness" using the following test:

Sample	Background Included?	Eye Inner	Chin	Neck
A) Full	yes	0.049	0.043	0.139
B) Head	upper corners	0.092	0.058	0.192
C) Face	none	0.278	0.190	0.505

(measurements taken by positioning a parallel plane at the near-most vertex (tip of the nose))

It is as you say. the model is very good at separating background from objects in foreground.
But I failed to find balance - I expected greater separation of detail between sample A and B, but the flatness is somewhat similar.
It wasn't until I cropped away all background pixels (C) that attention was diverted to detail and facial structure emerged.

end note
Seeing the difficulty of this problem, I am intrigued but also quite demotivated.

When I opened this issue I assumed that enriching detail iteratively would be as simple as inpainting depth the same way we inpaint color.

Problem 1.a) Selecting crop region.
As an artist I would manually select a crop region where inferred depth does not express the detail.
As an algorithm; I'm not sure, searching for flatness could maybe be done looking for regions with low local variance in depth and comparing to an edge detection in color/detail measurement.

Problem 2.a) Depth fusion.
The above test showed me that even if i know the depth1 bounds of a cropped area then scaling depth2 into that box would just reproduce the same flatness.

If i attempted to fix it manually with the blender's sculpting tools, I'd have push some of the surrounding vertices backwards and then pull some detail forward, meaning that depth1 would have to be partially invalidated.

Really starting to regret I asked.. haha

TL;DR having compared quite a few depth maps, The Flatness does not
occur on every input, some images render with great detail and captivating stereo others not so much.
Flat compositions will be flat, deep compositions get deeper.

(2) Sorry I barked up the wrong tree - and thank you. (i found my 8bit problem)
The encoding scheme I proposed is not as visually appealing as the grayscale when viewed by itself, but it's compatible with software that expects 8bit-maps and I can confirm that GeoWizard infers higher resolution than what can be represented by 8bits. However I saw no difference between RGB24 and and the obscure PNG GrayScale16bit mode in blender, but other software might fail to decode GS16.

Leaving breadcrumbs to my encoding tests and visualizations.

fuxiao0719 · 2024-10-10T14:34:53Z

Hi, thank you for your insightful input! We’ve also observed that the cropped region can impact flatness. A straightforward way to mitigate this is by setting the scale to 1.4 and the shift to 0.4 (these values can be adjusted based on visualization needs), and then converting the relative depth back to metric depth. We've found that an appropriate weight group of scale and shift works well for most human face cases. Best regards!

telamon · 2024-10-15T23:57:49Z

Thank you for the lead, but do you mind reopening the issue and maybe change title to "Expose scale & shift as options" ?
I will take a look at this myself but it'll take some time because I just switched assignments.
Regards!

fuxiao0719 · 2024-10-16T08:03:20Z

Sure

fuxiao0719 closed this as completed Oct 13, 2024

fuxiao0719 reopened this Oct 16, 2024

fuxiao0719 changed the title ~~Add a second depth pass~~ Expose scale & shift as options Oct 16, 2024

fuxiao0719 pinned this issue Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose scale & shift as options #38

Expose scale & shift as options #38

telamon commented Sep 26, 2024 •

edited

Loading

fuxiao0719 commented Sep 26, 2024

telamon commented Sep 26, 2024

fuxiao0719 commented Oct 5, 2024

telamon commented Oct 10, 2024

fuxiao0719 commented Oct 10, 2024

telamon commented Oct 15, 2024

fuxiao0719 commented Oct 16, 2024

Expose scale & shift as options #38

Expose scale & shift as options #38

Comments

telamon commented Sep 26, 2024 • edited Loading

fuxiao0719 commented Sep 26, 2024

telamon commented Sep 26, 2024

fuxiao0719 commented Oct 5, 2024

telamon commented Oct 10, 2024

fuxiao0719 commented Oct 10, 2024

telamon commented Oct 15, 2024

fuxiao0719 commented Oct 16, 2024

telamon commented Sep 26, 2024 •

edited

Loading