Questions Regarding Optical Flow Supervision and Potential Enhancements #50

linwk20 · 2024-09-19T18:46:59Z

First of all, thank you for your impressive work. I’ve been searching for methods that can provide accurate dense depth maps (which is why I believe FlowMap is significantly superior to colmap). It seems like using optical flow to fine-tune depth networks is a great idea. I have the following questions:

Why supervise depth with optical flow? Is it because optical flow typically offers higher accuracy and can offer subpixel reprojection errors for depth network? The reason i am asking this is that optical flow might not be accurate since it also comes from DNN and why we don't co-optimized it for the scene?
Potential improvement with a pretrained MVS model? If we use a pretrained large reconstruction model that takes multi-view as input as the depth estimator, is there a chance of significantly improving the final performance? Or do you think optical flow supervision already a form of multi-view stereo (MVS), making a pretrained MVS model unnecessary?
Can increasing resolution and image count improve depth accuracy? The current training resolution and the number of images supported are limited by GPU memory. However, we know that for models using Layer Norm instead of Batch Norm, we can accumulate gradients to achieve an equivalent large batch size (for example, the ViT model used in DepthAnything v2 follows this approach). If we use this method to greatly increase resolution and image count, do you think it will improve the final depth accuracy?

These are just some speculations, and I look forward to your response. Your thoughts may help us design more reasonable experiments. Thank you!

booker-max · 2024-11-21T11:46:00Z

First of all, thank you for your impressive work. I’ve been searching for methods that can provide accurate dense depth maps (which is why I believe FlowMap is significantly superior to colmap). It seems like using optical flow to fine-tune depth networks is a great idea. I have the following questions:

Why supervise depth with optical flow? Is it because optical flow typically offers higher accuracy and can offer subpixel reprojection errors for depth network? The reason i am asking this is that optical flow might not be accurate since it also comes from DNN and why we don't co-optimized it for the scene?

Potential improvement with a pretrained MVS model? If we use a pretrained large reconstruction model that takes multi-view as input as the depth estimator, is there a chance of significantly improving the final performance? Or do you think optical flow supervision already a form of multi-view stereo (MVS), making a pretrained MVS model unnecessary?

Can increasing resolution and image count improve depth accuracy? The current training resolution and the number of images supported are limited by GPU memory. However, we know that for models using Layer Norm instead of Batch Norm, we can accumulate gradients to achieve an equivalent large batch size (for example, the ViT model used in DepthAnything v2 follows this approach). If we use this method to greatly increase resolution and image count, do you think it will improve the final depth accuracy?

These are just some speculations, and I look forward to your response. Your thoughts may help us design more reasonable experiments. Thank you!

Hi, I am currently looking for an MVS model that has been trained on a large scale dataset, do you have any recommendations?

linwk20 · 2024-11-21T14:05:00Z

There are plenty of such MVS, one I know that runs at real time is Spann3r (https://hengyiwang.github.io/projects/spanner), but it is more like an
academic paper, not trained on large dataset.

michaelyuancb · 2024-11-27T06:56:56Z

I think optical flow error may be alleviate by the weights map prediced by FlowMap model. Co-training flow & depths is possible, I think it is a direction needed to explore, but the key issue may be the data resource.

And I've extend it to large scale datasets training & dynamic scenes, which shows promising results. However, I only focus on Hand-Object-Interaction Videos (since I'm focusing on EmbodiedAI & Robot Learning reseach). The project is called UniHOI and the repository is https://github.com/michaelyuancb/unihoi. I'm still improving the model, code & weights will be released in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Regarding Optical Flow Supervision and Potential Enhancements #50

Questions Regarding Optical Flow Supervision and Potential Enhancements #50

linwk20 commented Sep 19, 2024 •

edited

Loading

booker-max commented Nov 21, 2024

linwk20 commented Nov 21, 2024 •

edited

Loading

michaelyuancb commented Nov 27, 2024 •

edited

Loading

Questions Regarding Optical Flow Supervision and Potential Enhancements #50

Questions Regarding Optical Flow Supervision and Potential Enhancements #50

Comments

linwk20 commented Sep 19, 2024 • edited Loading

booker-max commented Nov 21, 2024

linwk20 commented Nov 21, 2024 • edited Loading

michaelyuancb commented Nov 27, 2024 • edited Loading

linwk20 commented Sep 19, 2024 •

edited

Loading

linwk20 commented Nov 21, 2024 •

edited

Loading

michaelyuancb commented Nov 27, 2024 •

edited

Loading