Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use neural networks for create depth map? #528

Open
Dok11 opened this issue Jun 29, 2019 · 43 comments
Open

Why not use neural networks for create depth map? #528

Dok11 opened this issue Jun 29, 2019 · 43 comments
Labels
do not close issue that should stay open (avoid automatically close because stale) feature request feature request from the community

Comments

@Dok11
Copy link

Dok11 commented Jun 29, 2019

Something like this https://github.com/ialhashim/DenseDepth#results
or this https://github.com/gautam678/Pix2Depth

Using this technlogy mey help to avoid some bugs on the flat/mirror/shiny areas. And using neural network will be faster than the current approach.
I wanted to see how DepthMap node works at this moment, but cant find code exept node description in meshroom/nodes/aliceVision/

@Baasje85
Copy link

I think it would be a good way to collect alternative methods and be able to implement then in new nodes with compatible in and output. For example I starred https://github.com/AIBluefisher/EGSfM but I forgot if it came from here or alicevision. This might also be an alternative method.

@natowi
Copy link
Member

natowi commented Jun 29, 2019

The problem is not finding alternatives:
https://github.com/timzhang642/3D-Machine-Learning
https://github.com/natowi/3D-Reconstruction-with-Neural-Network

GeoDesc replace SIFT
https://groups.google.com/forum/#!topic/alicevision/HQhqtJjGaQ0

The reconstruction system, named i23dMVS, ranks in the top 10 in tanks and temples dataset.
Here the link to the project: https://github.com/AIBluefisher/GraphSfM
Since GraphSfM is partially based on (an early) version of OpenMVG and licensed under BSD 3-Clause,
I think it should be possible to include this approach in Meshroom to accelerate large scale reconstructions
https://groups.google.com/forum/#!topic/alicevision/_5Eo6hqLBS8

If you are interested in implementing a Machine Learning approach, you are welcome to contribute to Alicevision.

Here is a similar project

@Dok11
DepthMap is used in src/software/pipeline/main_depthMapEstimation.cpp, src/software/pipeline/main_depthMapFiltering.cpp. The includes are <aliceVision/depthMap/RefineRc.hpp> and <aliceVision/depthMap/SemiGlobalMatchingRc.hpp>
From alicevision/AliceVision#439 (comment)

@Baasje85
Copy link

@natowi could we maybe add this knowledge to https://github.com/alicevision/meshroom/wiki/Good-first-contributions ?

@Dok11
Copy link
Author

Dok11 commented Jun 30, 2019

https://github.com/timzhang642/3D-Machine-Learning

Wow, its amazing. Nice topic!

If you are interested in implementing a Machine Learning approach, you are welcome to contribute to Alicevision.

It may be interest challenge. If I will realize how compilation program and related actions to contribute :)

DepthMap is used in src/software/pipeline/main_depthMapEstimation.cpp

I didnt found this path in repo. May be you can provide link?

@natowi
Copy link
Member

natowi commented Jun 30, 2019

@Dok11 also #520 (comment)

@hargrovecompany
Copy link

This is way over my skill level, but I am interested. I have a photo booth, 360 degrees, cameras and lighting facing inward. Light flare/reflection is extremely difficult to manage....I'm wondering if this type of approach would make life easier for me

@Dok11
Copy link
Author

Dok11 commented Jul 3, 2019

I'm wondering if this type of approach would make life easier for me

It will work and solve your problem, but I cant say when this workflow with neural networks will be able in usual software like meshroom.
As I know currently only Pix4D using neural networks in their software and result of work not best from all alternatives

@natowi
Copy link
Member

natowi commented Jul 3, 2019

@Dok11 I found a script for Metashape (Photoscan) https://github.com/agisoft-llc/metashape-scripts/blob/master/src/model_style_transfer.py and tensorflow... But this is for model style transfer (texture) and not 3d reconstruction

@Dok11
Copy link
Author

Dok11 commented Jul 3, 2019

I still save interest for this topic but without Meshroom.
For me more easy try this technology directly in Blender — easist way to write code and see result without require to build whole application. So now I studying how photogrammetry works complete with whole pipeline from detect camera position to meshing. And as I realize almost that all feature can be (and better) implemented by neural networks instead classic algorithm.

Maybe I will still motivation and will do something real from this words :)

@Dok11 I found a script for Metashape (Photoscan) https://github.com/agisoft-llc/metashape-scripts/blob/master/src/model_style_transfer.py and tensorflow... But it looks like it is only being used for texturing.

I see using style transfer (one of type on neural networks). Maybe it provide ability to avoid glithces on textures? But is just assumption

@Dok11
Copy link
Author

Dok11 commented Sep 6, 2019

@natowi @Baasje85
I guess I need your advice.

Currently I develop network which can potential predict camera positions.
I learned papers from arxiv and based on them create neural network what get two images and returns 7 numbers where first four is wxyz quaternion rotation camera from one image relative to camera second image. This pretty work as I can see.

But second three numbers currently is xyz delta positions between to images.
It also good works on the accuracy metrics but I sure what is incorrect and not stable result.
camera_deltas__pos-xyz

Now I'll some describe for more clarity:
For trainig neural netwok I made many many syntetic images from blender demo scenes and get dimensions from these scenes. This data pass to neural network as trainable data.
image
image
image

But now imagine what will be if I'll scale whole scene up to 10 times? All dimensions size will be increased BUT content in images will not change and still as was before.

So what I want. Maybe you can suggest some method to describe camera movement in 3d space without absolute variables in scene dimensions?

@skinkie
Copy link

skinkie commented Sep 7, 2019

Thanks @Baasje85 pointing me here. @Dok11 have you considered using parametric constraints first? For example by angle, percentage, projection, scale. If all your project images files are resolved you could resolve the parametric constraints (maybe even including ground controlpoint registration) towards absolute positions. Then use the current pipeline to do the other tricks with respect to alignment.

@Dok11
Copy link
Author

Dok11 commented Sep 10, 2019

@skinkie thanks for your participian! Seems like you right but for my opinion any constraint make task not so universal. I searched decision based on camera degrees but here people describe me what it is impossible :) well okay
https://math.stackexchange.com/questions/3350091/is-would-possible-to-describe-length-just-in-degrees
image

In time decision searching I got a couple of ideas to make neural network more robustness. Maybe in next week I share my results and decision details, if I successed.

And more. For more accurate distance result from picture to picture I think need to use pipeline with two neural networks:
The first NN get two images with known FOV and give result with delta of rotation degrees and delta of distance between cameras in world units.
Second neural network get four images: target, before and first and second what was given to first neural network with delta data from that NN. Then returns own result with usage all known for them data.
If you have some thoughts about it I'll be glad to got them.

@skinkie
Copy link

skinkie commented Sep 10, 2019

Bees track their position over time by the the angle towards the moving sun, a distance metric based on the rocking of their wings and a local heuristic based on smell. I would completely agree that using a single constraint doesn't work similar to using a single x or a single y. But this is not the case with parametric constraints, you can apply multiple (even controversial) constraints based on angle and distance (offset), all being relative towards each other. Your approach using multiple steps is advisable because an other reason: even with an unknown FOV you could infer a FOV in relative units which is a new metric. I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

@Dok11
Copy link
Author

Dok11 commented Sep 16, 2019

Bees track their position over time by the the angle towards the moving sun, a distance metric based on the rocking of their wings and a local heuristic based on smell.

And they (bees or other insects) can make mistakes with light bulb. I want say that this constraint not enough universal.

even with an unknown FOV

I think it rary case. Usually we can extract FOV from metadata of photos.
Or we can create new neural network. It may be pretty simple.

I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

Neural network while trainig potentially can learn extract right and correct features from images for doing more accurate camera positions than classic algorightms. You can go to forums or issues tracker to see how often people have fails with 3d reconstruction for some noise/motion blur, or photos with too large different angle between camera positions and event mirrors or shine surafaces images.
Not so far ago I did photogrammetry for the large indoor scene. In this scene was not have enough lightness, so images was pretty noise but any human can say where did any photos relative to other. But not for Meshroom or other photogrammetry software based on classic algorightms. So for my scene I lost around 500-700 images of 1300. Offensively.
But I'm shure what neural netowrk may do it better. And I'll try to do it.
My first prupose is making node for meshroom what can replace this node group:
image

It may take many month. And may be impossible for me =)
But currently I have pretty positive results.

@skinkie
Copy link

skinkie commented Sep 16, 2019

And they (bees or other insects) can make mistakes with light bulb. I want say that this constraint not enough universal.

Heat and polarized light (or in the case of bees: events of green) can confuse the orientation. But this goes for any parameterized situation: create ambiguity and it will try to find another optimum.

I do wonder if it wouldn't be better to do so with the existing approaches opposed to neural networks. As I wrote in another ticket even information such as time and order of the photo might give you significant clues regarding the search space.

Neural network while trainig potentially can learn extract right and correct features from images for doing more accurate camera positions than classic algorightms.

I agree on this fully. But I actually mean using the classic approach for camera calibration.

Not so far ago I did photogrammetry for the large indoor scene. In this scene was not have enough lightness, so images was pretty noise but any human can say where did any photos relative to other. But not for Meshroom or other photogrammetry software based on classic algorightms. So for my scene I lost around 500-700 images of 1300. Offensively.
But I'm shure what neural netowrk may do it better. And I'll try to do it.
My first prupose is making node for meshroom what can replace this node group:
image

You want to replace the entire node group with one black box? Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

It may take many month. And may be impossible for me =)
But currently I have pretty positive results.

Good luck. I'll obviously support it. For me, I want to go in the completely opposite direction of supervised learning. Allow the user to assist meshroom in getting better results.

@Dok11
Copy link
Author

Dok11 commented Sep 16, 2019

You want to replace the entire node group with one black box?

No. I have plan to make several neuralnetworks, it includes:

  1. NN for extract camera FOV value (I hope it will not need)
    Not working with it absolutely. But most likely it not too hard.

  2. NN for search most nearest images in space or estimate sequence of captures
    Have some interest ideas and found good papers from arxiv.org for this prupose. Maybe soon will try it. Starter dataset for this purpose almost ready.

  3. NN for estimate camera positions (in fact it will be two neuralnetworks)
    Most effective NN what I have. And most time I spent for this task.

  4. NN for generate depth maps or point cloud from any camera.
    Dataset almost ready, but NN just in demo mode. Need for hard training and power GPU.

Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

I thought estimation camera position dont cross with FeatureExtraction tasks.
I thought these nodes generate low res point cloud for next nodes and these nodes hard connective together.
But if my NN for estimate camera positions in scene can replace these nodes it will perfect! Because currently in Meshroom them taken much time but NN can do it very fast (maybe around several seconds per 100 images) on GPU.

@skinkie
Copy link

skinkie commented Sep 16, 2019

  1. NN for search most nearest images in space or estimate sequence of captures
    Have some interest ideas and found good papers from arxiv.org for this prupose. Maybe soon will try it. Starter dataset for this purpose almost ready.

@Baasje85 and myself have a huge dataset available created by multiple camera's, some even geotagged. If you are interested in using it, we can obviously provide it.

Why not start with providing an alternative FeatureExtraction - FeatureMatching approach? Returning camera orientation is something that is by itself already valuable. And you might be able to use it as an ensemble.

I thought estimation camera position dont cross with FeatureExtraction tasks.
I thought these nodes generate low res point cloud for next nodes and these nodes hard connective together.

This may be true for meshroom (and its depthmap), but not for other approaches like openMVG/openMVS in which a high density point cloud seems to be the thing used.

But if my NN for estimate camera positions in scene can replace these nodes it will perfect! Because currently in Meshroom them taken much time but NN can do it very fast (maybe around several seconds per 100 images) on GPU.

Would obviously be cool :-)

@Dok11
Copy link
Author

Dok11 commented Sep 16, 2019

@Baasje85 and myself have a huge dataset available created by multiple camera's, some even geotagged. If you are interested in using it, we can obviously provide it.

Sounds good! Can you show several examles of it?
It can be helpful. But how this dataset marked? For example two photos with same coords can be not cross because did for different direction top/bottom or left/right.
And one point. Training the neural netowk for this tasks can be do without "teacher". NN learn extract some key features from images and compare how many of them is equals.
As I know Face ID or other image comaprison approaches use same technique.

image

image

This may be true for meshroom (and its depthmap), but not for other approaches like openMVG/openMVS in which a high density point cloud seems to be the thing used.

I did not come across this. Maybe do it for Meshroom will be more understandable task.
In fact I dont very nice understand how photogrammetry pipline work in every detail :)

@Dok11
Copy link
Author

Dok11 commented Oct 7, 2019

@natowi @Baasje85 maybe you can say. Is would be useful for Meshroom the neural netowork what can repair camera positions in scene? Or SfM also while working retrieves sparse point cloud?

I think about minimal valuable node to start integation neural network into Meshroom.

@skinkie
Copy link

skinkie commented Oct 7, 2019

@Dok11 with repair you mean something analogue to LocalBundle adjustment?

@Dok11
Copy link
Author

Dok11 commented Oct 7, 2019

I did't seen this term before but seems like you right. As I wrote before my currently NN can define camera positions in scene by image (transition and rotation).
I dont know how much better (or worse) NN works in comparison with classic SfM at current state. So I want test it in real tasks as soon as possible, while I have motivation to this task :D

In sum I want know minamal set of skills of NN which can be helpful to Meshroom.

@skinkie
Copy link

skinkie commented Oct 7, 2019

If you can return the intrinsics for the camera's I am sure that would significantly reduce the computing effort to find them.

@natowi natowi added feature request feature request from the community do not close issue that should stay open (avoid automatically close because stale) and removed type:enhancement labels Oct 27, 2019
@Dok11
Copy link
Author

Dok11 commented Nov 18, 2019

If you can return the intrinsics for the camera's I am sure that would significantly reduce the computing effort to find them.

Well I just started from preparation dataset =)
image

Coming soon!
Not, it's not so fast as I want :(

@Dok11
Copy link
Author

Dok11 commented Dec 26, 2019

@skinkie @natowi Hello =)

I am at final step to provide MVP of NN which can predict intersection of two photos.
We can give to NN the two images and get number how much common surfaces on second image from first.
image

As I realized this feature can save time in task to find common features in each images. This is true?

@skinkie
Copy link

skinkie commented Jan 19, 2020

@Dok11 https://arxiv.org/abs/1506.06825 Seen that one?

@Dok11
Copy link
Author

Dok11 commented Jan 19, 2020

@skinkie not this one, but every year on arxiv appear several works in this theme. For example more newest article https://arxiv.org/abs/2001.05036 (january 2020), I have seen that but in common way it about either creating only depth maps or generating new points of view.
It may be useful in future. Currently I focused on tasks with estimate camera position as first steps of photogrammetry process. And estimating depth maps is highly depend from hardware performance, so I think I need up my skill in optimization networks on more simple tasks at start.

And I was mistake when think what including neural network in the process is easy. Not rocket science of course but still.

Neural network which about I wrote above is not ready but will be useful for neural network which will camera pose estimation. Release day postponed xD
Currently I write article about this and made video with vizualization of dataset generator if you will be interested =)
https://www.youtube.com/watch?v=6bec2NmpFOc

@natowi
Copy link
Member

natowi commented Jan 26, 2020

"DeepV2D: Video to Depth with Differentiable Structure from Motion"
https://arxiv.org/abs/1812.04605
https://github.com/princeton-vl/DeepV2D

@nicolalandro
Copy link

Hello everyone, as suggested at the top of this issue, I start to implement Dense depth as Meshroom node( codebase).
The idea is to change the DenseDepth block, and I have the code to obtaine dense map from a neural network, but I'm stopped because I must write a .exr file. I find this but how I must write this file?

(I'm new in this project so I do a class plugin that call by command the python code, I think that it can be included into the standard class of Meshroom but I do not know enough, the dependencies are torch and numpy for now, if someone have advice help me also for this point)

Thank you in advance!

@natowi
Copy link
Member

natowi commented Jul 3, 2021

@nicolalandro that´s great!

I found a few OpenExr examples that may be useful for you:
https://github.com/mlagunas/pytorch-nptransforms/blob/master/exr_data.py
https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/io/exr.py
PIL2exr https://stackoverflow.com/questions/65605761/write-pil-image-to-exr-using-openexr-in-python

There is aliceVision_utils_imageProcessing (called by https://github.com/alicevision/meshroom/blob/develop/meshroom/nodes/aliceVision/ImageProcessing.py) which can convert images to exr, but I don´t know how useful this is in your case. Meshroom does use OIIO for image processing and even has its own oiio plugin for qt https://github.com/alicevision/QtOIIO/.

@nicolalandro
Copy link

Thank you for all material! It is very interesting!

My question is about the information that I should write in this file, I must write dept only or RGB + Depth?
As soon as I have again time I will try to read the file and I try to copy it, If exist a doc it can be better (also because it depend from the next Block). If there is nothing I do reverse engineering and I copy from the existing results.

@natowi
Copy link
Member

natowi commented Jul 4, 2021

My question is about the information that I should write in this file, I must write depth only or RGB + Depth?

Now I see what you were asking for. I think it is depth only, as for the RGB information the undistorted images will be used. Meshroom uses PIZ (wavelet compression) for exr. "Depth maps are stored in 32 bits floating point EXR"
(The writeImage functions in AliceVision/src/aliceVision/mvsData/imageIO.hpp are being used to write the depthmaps, maybe this is helpful for you.* The DepthMap part is here.)

@nicolalandro
Copy link

Thank you! As soon I have a minute I try to implement It and test!

@nicolalandro
Copy link

nicolalandro commented Jul 5, 2021

As I expected when I create the exp file I miss some important header information.
I have this error, and I think it maybe depends on the absence of camera info in my header? (more information here):

Schermata del 2021-07-05 08-06-57

If someone wants to try to run it (code) I suggest using small images like 720*980 to have a quick response (up to now the node do not use GPU).

Does anyone have some ideas?

If not I can get the data directly from the DeepMap node and change the Y value that is the depth map to reach quick results and ask the question "Can this AI be included in the standard pipeline?" (but I want to replace that node). Doing this I see that the node create a depthMap but also a SImMap, what is a SimMap? I also try to copy it, but I get the same error.

@natowi
Copy link
Member

natowi commented Jul 6, 2021

@nicolalandro I just remembered there was the experimental masking node, that also uses the openexr library on the Meshroom side: https://github.com/alicevision/meshroom/pull/641/files L83

The discussion around this topic also contains some valuable information like #566 (comment) or #566 (comment)

Also #1361 (comment)

up to now the node do not use GPU

That is especially interesting, so I´ll be happy to check out your code, when I have some time!

I hope this information is helpful for you. If you need more information, maybe @ALfuhrmann can give you some input on the openexr writer based on his node code I mentioned earlier or @fabiencastan can give you a hint.

The documentation on the EU project from which Meshroom originates from has some valuable (but sometimes outdated!) information that is not documented anywhere else. They are publicly available, but I merged the relevant articles here (search for depth map / similarity map / exr)

@natowi
Copy link
Member

natowi commented Jul 27, 2021

@nicolalandro any progress?
Maybe @remmel is willing to help you out if you are still stuck :)
#1361 (comment)

@nicolalandro
Copy link

Now I'm working a lot so I do not use time to continue the work: there there is DenseDepthDeepMap.py that run the main.py script that extract the depth map and save into files, but it seams do not work into the pipeline. I'm locked here.

@remmel
Copy link
Contributor

remmel commented Jul 27, 2021

@nicolalandro , do you still have problems writing exr? To write it, I simply uses python opencv:

depths = np.zeros((h, w), np.float32)
cv.imwrite(path, depths)

@nicolalandro
Copy link

I think that I writer correct the exr, but the problem Is that the death map Is not the only output of the standard node. So the standard flow Is broken.

@remmel
Copy link
Contributor

remmel commented Jul 28, 2021

@nicolalandro I'm doing something similar: creating my own depthMap.exr
If the normal pipeline is used with your script, you could override the _depthMap.exr of DepthMapFilter with yours. In my case, it didn't occured any problem keeping the _simMap.exr untouched. But you have to make sure that you are using the same scale. I can check the scale for you, if you send me the depthMap.exr created by your neural script, the one created by normal pipeline and the textured mesh created by normal pipeline (I'll visualise the exr in 3d in https://remmel.github.io/image-processing-js/rgbd-viewer.html)

@nicolalandro
Copy link

I'll try by removing the original and sostituite with mine, I Will try with both but I do not keep the _simMap, I Will try when I can, but in this way the computation Is worse, maybe the good idea can be to change the whole workflow.

@remmel
Copy link
Contributor

remmel commented Jul 28, 2021

I've quickly checked your code, and there are some stuff which doesnt seems to be mandatory like adding the header or clipping the predictions
Also, depthMap.exr is storing distance from pinhole. But your prediction might use z distances instead (like kinect,android depht16, etc... dephmaps for example). In that case you have to convert them, see https://github.com/remmel/image-processing-js/blob/master/tools/depthMapsExr.py#L102
I doesnt see neither code related with scaling your predictions to match sfm coordinate. For example in sfm 3=1 meter but maybe in your prediction 1100=1m

@natowi
Copy link
Member

natowi commented Jun 3, 2024

https://github.com/alicevision/MeshroomResearch

Deep-learning-based depth map estimation:

VIZ-mvsnet

Optimization-based via NerfStudio (Upcoming):

Instant-ngp
3D Gaussian Splatting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not close issue that should stay open (avoid automatically close because stale) feature request feature request from the community
Projects
None yet
Development

No branches or pull requests

7 participants