-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose scale & shift as options #38
Comments
Hi, thanks for your interesting insight! Can you provide the input image as well as the crop code? I'll respond to this issue within one week after the ICLR submission ddl. |
The image above is an asset, but I think any high resolution image should do. There's currently no crop-code, I used Gimp and Blender to quickly test the idea before hacking together something like an "InpaintDepth" comfyui-custom-node - I'm still unsure if it's possible to blend the two depth maps together with decent results. That's why I'm asking for feedback. No stress and good luck at the conference. |
(1) For the first query: (2) For the second query: |
Thank you for the reply.
(measurements taken by positioning a parallel plane at the near-most vertex (tip of the nose)) It is as you say. the model is very good at separating background from objects in foreground. end note When I opened this issue I assumed that enriching detail iteratively would be as simple as inpainting depth the same way we inpaint color. Problem 1.a) Selecting crop region. Problem 2.a) Depth fusion. If i attempted to fix it manually with the blender's sculpting tools, I'd have push some of the surrounding vertices backwards and then pull some detail forward, meaning that Really starting to regret I asked.. haha
(2) Sorry I barked up the wrong tree - and thank you. (i found my 8bit problem) Leaving breadcrumbs to my encoding tests and visualizations. |
Hi, thank you for your insightful input! We’ve also observed that the cropped region can impact flatness. A straightforward way to mitigate this is by setting the scale to 1.4 and the shift to 0.4 (these values can be adjusted based on visualization needs), and then converting the relative depth back to metric depth. We've found that an appropriate weight group of scale and shift works well for most human face cases. Best regards! |
Thank you for the lead, but do you mind reopening the issue and maybe change title to "Expose scale & shift as options" ? |
Sure |
edit 2:
In order to focus the model to onto different details.
As an inference person I would like to expose two additional variables to tweak:
original text:
Hello, first of all I'd like to say thank you for this model, It's really fun to use and produces convincing results.
But I have an issue, it seems to me that a lot of detail is being lost/misplaced due to the low resolution of the depth map output (8bits)
So I tried running inference on an image twice.
First pass: the original image
Second pass: a cropped out region with high detail (face) and then stitched both of the depth maps together.
The results were surprisingly refreshing:
left) Single-pass. Most of the foreground detail had been pushed into the upper boundary, and the character appears quite flat.
right) Second pass reveals new geometry, smoother surfaces but also some more noise.
So my first question is, if geowizard/inference is used recursively to refine and inpaint regions with higher detail - what would be the most correct way to merge the two depth maps? my naive approach would be:
Second inquiry; most current depth maps are grayscale "PNG"-files. wouldn't it make sense to break the norm and start using the green and blue channels to store higher resolution depth information?
I guess it's possible to keep the red component as is, but i don't understand why nobody is storing additional fractions in the other two bytes.
Lastly I know little about training, but I wonder if the model was trained to produce 8bits output,or if by any chance there's an 8bit bottleneck somewhere in the training pipeline that would prevent
the produced depth maps from being as smooth as it's sibling normals/maps.
output is 32bit
Thanks for hearing me out!
edit: fixed late night code snippets 🧠 🔫
The text was updated successfully, but these errors were encountered: