Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Optical Flow Supervision and Potential Enhancements #50

Open
linwk20 opened this issue Sep 19, 2024 · 3 comments
Open

Comments

@linwk20
Copy link

linwk20 commented Sep 19, 2024

First of all, thank you for your impressive work. I’ve been searching for methods that can provide accurate dense depth maps (which is why I believe FlowMap is significantly superior to colmap). It seems like using optical flow to fine-tune depth networks is a great idea. I have the following questions:

  • Why supervise depth with optical flow? Is it because optical flow typically offers higher accuracy and can offer subpixel reprojection errors for depth network? The reason i am asking this is that optical flow might not be accurate since it also comes from DNN and why we don't co-optimized it for the scene?
  • Potential improvement with a pretrained MVS model? If we use a pretrained large reconstruction model that takes multi-view as input as the depth estimator, is there a chance of significantly improving the final performance? Or do you think optical flow supervision already a form of multi-view stereo (MVS), making a pretrained MVS model unnecessary?
  • Can increasing resolution and image count improve depth accuracy? The current training resolution and the number of images supported are limited by GPU memory. However, we know that for models using Layer Norm instead of Batch Norm, we can accumulate gradients to achieve an equivalent large batch size (for example, the ViT model used in DepthAnything v2 follows this approach). If we use this method to greatly increase resolution and image count, do you think it will improve the final depth accuracy?

These are just some speculations, and I look forward to your response. Your thoughts may help us design more reasonable experiments. Thank you!

@booker-max
Copy link

First of all, thank you for your impressive work. I’ve been searching for methods that can provide accurate dense depth maps (which is why I believe FlowMap is significantly superior to colmap). It seems like using optical flow to fine-tune depth networks is a great idea. I have the following questions:

  • Why supervise depth with optical flow? Is it because optical flow typically offers higher accuracy and can offer subpixel reprojection errors for depth network? The reason i am asking this is that optical flow might not be accurate since it also comes from DNN and why we don't co-optimized it for the scene?
  • Potential improvement with a pretrained MVS model? If we use a pretrained large reconstruction model that takes multi-view as input as the depth estimator, is there a chance of significantly improving the final performance? Or do you think optical flow supervision already a form of multi-view stereo (MVS), making a pretrained MVS model unnecessary?
  • Can increasing resolution and image count improve depth accuracy? The current training resolution and the number of images supported are limited by GPU memory. However, we know that for models using Layer Norm instead of Batch Norm, we can accumulate gradients to achieve an equivalent large batch size (for example, the ViT model used in DepthAnything v2 follows this approach). If we use this method to greatly increase resolution and image count, do you think it will improve the final depth accuracy?

These are just some speculations, and I look forward to your response. Your thoughts may help us design more reasonable experiments. Thank you!

Hi, I am currently looking for an MVS model that has been trained on a large scale dataset, do you have any recommendations?

@linwk20
Copy link
Author

linwk20 commented Nov 21, 2024

There are plenty of such MVS, one I know that runs at real time is Spann3r (https://hengyiwang.github.io/projects/spanner), but it is more like an
academic paper, not trained on large dataset.

@michaelyuancb
Copy link

michaelyuancb commented Nov 27, 2024

I think optical flow error may be alleviate by the weights map prediced by FlowMap model. Co-training flow & depths is possible, I think it is a direction needed to explore, but the key issue may be the data resource.

And I've extend it to large scale datasets training & dynamic scenes, which shows promising results. However, I only focus on Hand-Object-Interaction Videos (since I'm focusing on EmbodiedAI & Robot Learning reseach). The project is called UniHOI and the repository is https://github.com/michaelyuancb/unihoi. I'm still improving the model, code & weights will be released in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants