Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoissonRecon on Windows fails on certain machines with "Failed to open file:" #1307

Closed
pierotofy opened this issue Jun 18, 2021 · 11 comments
Closed

Comments

@pierotofy
Copy link
Member

[INFO]    running "D:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\dem2points" -inputFile "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\tmp\mesh_dsm.tif" -outputFile "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\tmp\dsm_points.ply" -skirtHeightThreshold 1.5 -skirtIncrements 0.2 -skirtHeightCap 100
[INFO]    running "D:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\PoissonRecon" --in "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\tmp\dsm_points.ply" --out "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\odm_25dmesh.dirty.ply" --depth 11 --pointWeight 4 --samplesPerNode 1.0 --threads 47 --bType 2 --linearFit
[ERROR] Failed to open file:
[INFO]    running "D:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\OpenMVS\ReconstructMesh" -i "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\odm_25dmesh.dirty.ply" -o "D:\WebODM\resources\app\apps\NodeODM\data\b169bf47-05b9-47bc-9134-f8e4979501f4\odm_meshing\odm_25dmesh.ply" --remove-spikes 0 --remove-spurious 20 --smooth 0 --target-face-num 400000
===== Dumping Info for Geeks (developers need this to fix bugs) =====
Child returned 1
Traceback (most recent call last):
File "D:\WebODM\resources\app\apps\ODM\stages\odm_app.py", line 89, in execute
self.first_stage.run()
File "D:\WebODM\resources\app\apps\ODM\opendm\types.py", line 340, in run
self.next_stage.run(outputs)
File "D:\WebODM\resources\app\apps\ODM\opendm\types.py", line 340, in run
self.next_stage.run(outputs)
File "D:\WebODM\resources\app\apps\ODM\opendm\types.py", line 340, in run
self.next_stage.run(outputs)
[Previous line repeated 2 more times]
File "D:\WebODM\resources\app\apps\ODM\opendm\types.py", line 321, in run
self.process(self.args, outputs)
File "D:\WebODM\resources\app\apps\ODM\stages\odm_meshing.py", line 66, in process
mesh.create_25dmesh(tree.filtered_point_cloud, tree.odm_25dmesh,
File "D:\WebODM\resources\app\apps\ODM\opendm\mesh.py", line 42, in create_25dmesh
mesh = screened_poisson_reconstruction(dsm_points, outMesh, depth=depth,
File "D:\WebODM\resources\app\apps\ODM\opendm\mesh.py", line 184, in screened_poisson_reconstruction
system.run('"{reconstructmesh}" -i "{infile}" '
File "D:\WebODM\resources\app\apps\ODM\opendm\system.py", line 106, in run
raise SubprocessException("Child returned {}".format(retcode), retcode)
opendm.system.SubprocessException: Child returned 1

Machines with lots of cores seem more affected than others ?

@pierotofy
Copy link
Member Author

@pierotofy
Copy link
Member Author

pierotofy commented Jun 18, 2021

Another user reported a memory error after replacing the binary with the official PoissonRecon.exe release:

[WARNING] Removing previous point cloud: A:\WebODM\resources\app\apps\NodeODM\data\ba3cfa00-668d-4a87-a4c7-b835f3852574\odm_filterpoints\point_cloud.ply
[INFO]    Finished odm_filterpoints stage
[INFO]    Running odm_meshing stage
[INFO]    Writing ODM Mesh file in: A:\WebODM\resources\app\apps\NodeODM\data\ba3cfa00-668d-4a87-a4c7-b835f3852574\odm_meshing\odm_mesh.ply
[INFO]    running "A:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\PoissonRecon" --in "A:\WebODM\resources\app\apps\NodeODM\data\ba3cfa00-668d-4a87-a4c7-b835f3852574\odm_filterpoints\point_cloud.ply" --out "A:\WebODM\resources\app\apps\NodeODM\data\ba3cfa00-668d-4a87-a4c7-b835f3852574\odm_meshing\odm_mesh.dirty.ply" --depth 11 --pointWeight 4.0 --samplesPerNode 1.0 --threads 111 --bType 2 --linearFit
===== Dumping Info for Geeks (developers need this to fix bugs) =====
Child returned 3221225477
Traceback (most recent call last):
File "A:\WebODM\resources\app\apps\ODM\stages\odm_app.py", line 89, in execute
self.first_stage.run()

Probably due to the number of cores ?

@Saijin-Naib
Copy link
Contributor

Piero, this may be stupid (and I can't repro locally since old machine), but what if the issue is related to like... number size/allocation overflow. Something with number of threads taking two integers to report (10 threads or more) instead of one integer (9 threads or less).

If that's correct, someone locking their max-concurrency to 9 should be 100% fine all day even on the newer CPUs.

@pierotofy
Copy link
Member Author

pierotofy commented Jun 19, 2021

someone locking their max-concurrency to 9 should be 100% fine all day even on the newer CPUs.

I've been asking a few people to try this (limit max-concurrency), but haven't heard consistent reports back on whether it works as a workaround. It might.

@Saijin-Naib
Copy link
Contributor

someone locking their max-concurrency to 9 should be 100% fine all day even on the newer CPUs.

I've been asking a few people to try this (limit max-concurrency), but haven't heard consistent reports back on whether it works as a workaround. It might.

Would it make sense to have a "min-concurrency" flag for testing so I could see if this behavior can be reproduced on older "stable" CPUs just by pushing concurrency above 9?

@pierotofy
Copy link
Member Author

For testing purposes, I would probably just patch

'threads': threads,
to change the threads count.

@Saijin-Naib
Copy link
Contributor

Seems to be working just fine still with threads modified to be 16 (CPU is 4 physical, 8 logical).

I got nothing...

AVX/SSE thing? Hardware specter mitigation issue?

@pierotofy
Copy link
Member Author

This, along with OpenDroneMap/NodeODM#158 are both puzzling; I'm not really sure of the root cause. They both seem filesystem related.

@pierotofy
Copy link
Member Author

I've been able to reproduce this on my machine with:

C:\>C:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\PoissonRecon --in "C:\WebODM\resources\app\apps\NodeODM\data\54eaa914-863a-4000-8bf2-a7323a54402c\odm_filterpoints\point_cloud.ply" --out "C:\WebODM\resources\app\apps\NodeODM\data\54eaa914-863a-4000-8bf2-a7323a54402c\odm_meshing\odm_mesh.dirty.ply" --depth 11 --pointWeight 4.0 --samplesPerNode 1.0 --threads 160 --bType 2 --linearFit
[ERROR] Failed to open file:

So I bumped the number of threads. This does look like a race condition of some sort.

@pierotofy
Copy link
Member Author

Even the official binaries crash once in a while, yet with other errors:

C:\>c:\users\pt\downloads\PoissonRecon --in "C:\WebODM\resources\app\apps\NodeODM\data\54eaa914-863a-4000-8bf2-a7323a54402c\odm_filterpoints\point_cloud.ply" --out "C:\WebODM\resources\app\apps\NodeODM\data\54eaa914-863a-4000-8bf2-a7323a54402c\odm_meshing\odm_mesh.dirty.ply" --depth 11 --pointWeight 4.0 --samplesPerNode 1.0 --threads 60 --bType 2 --linearFit
[ERROR] C:\Research\PoissonRecon\PoissonRecon\Src\Allocator.h (Line 144)
        Allocator<struct RegularTreeNode<3,class FEMTreeNodeData,unsigned short> >::newElements
        elements bigger than block-size: 8 > 2

@pierotofy
Copy link
Member Author

Workaround in place as part of #1308.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants