-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda 11.1 - Coordinate manager #330
Comments
The error says that you fed a 0 length feature matrix. You might want to put a break point |
Exactly the same code runs on the same computer if I use Cuda 10.2 with the same ME version, so I assume it is a combination of the Cuda 11.1 with ME. If I debug the code step by step the error actually happens before, when I cast the values to the sparse tensor like:
the error message is:
The inputs are generated with
but the batch size is one and a single worker is used in the data loader. The dimension of the inputs is [5296,3] and [5296,4] respectively. I have tried to generate a minimum working example but if I just cast random values to a tensor it works without an error. |
It would be great if you can prepare a self-contained code for debugging. |
So the following example should show the problem. On my machine (the diagnostic is in the first post) in returns:
I think that there is something wrong when casting the features to the ME.SparseTensor, as I can for example also not use Just as an info the same code with Cuda 10.2 returns
Thank you in advance for your help
|
Hi! Any chance this issue could be looked at? I am using an NVIDIA 3000 series GPU which only runs CUDA 11 and therefore I cannot use Minkowski Engine. |
Just FYI, the code snippet works on my Machine: MinkowskiEngine==0.5.0, Cuda 11.2, GeForce RTX 3090. |
I ran the snippet on CUDA 11.2 (ME==0.5.1, RTX 3090) and still getting the same error. ME==0.5.0 wouldn't compile. |
What I ran into in #338 is perhaps the same issue as this? |
Hmm, can't replicate the error on the latest master. Both with CUDA 11.0 and CUDA 11.1. My environments are python -c "import MinkowskiEngine; MinkowskiEngine.print_diagnostics()"
and
|
Hey Chris, @chrischoy , I still produce this error on 3090 GPU using the latest master, with the environments: ==========System========== ==========MinkowskiEngine========== I first remove the last conda environments, and create new environments using conda.
|
Sorry, I misread the issue. I assumed the cudaIllegalMemoryAccess was the problem. Yes, I was able to reproduce this error. Let me get back to you ASAP. |
TLDR: This is an error in pytorch (v1.8.X + CUDA11.X) which affects many other custom C extension libraries. On pytorch 1.8.1 + cuda 11.1 import MinkowskiEngine as ME
import torch
coordinates = torch.rand(8192,3) * 200
bcoords, bfeats = coordinates.cuda(), coordinates.cuda()
print(bcoords, bfeats) # without print, it works fine... print seems to be triggering something
ME.SparseTensor(bfeats, bcoords) The full log for the above script with ME debug installation is
The A related issue happens also on these libraries with pytorch 1.8.x + CUDA 11.X This is a pytorch error which probably will be fixed in the next update. In the meantime, I'll update the readme and recommend
but not
|
Are there any updates on this? I have an RTX 3090, which is only compatible with CUDA 11.1+. |
Does pytorch1.9+cuda11.1 fix this problem? Thx. |
I have tried running the codes that were given in the previous posts, they are running fine. So, I guess this is fixed.
|
Great! I wasn't sure it was solved. So I'll close the ticket since I got the confirmation that it's been resolved. |
Hi Chris,
I have stumbled onto the following problem when using ME 0.5.1 or 0.5.2 with Cuda 11.1:
Note that the same code works perfectly fine with Cuda 10.2. I am sorry that I do not have a very compact working example, but the error occurs when running the code available in https://github.com/zgojcic/Rigid3DSceneFlow. For example when running the following evaluation:
If you actually want to run the code you also have to download the dataset, but it is very small (see the repo). If I can help you somehow or should provide more information, please let me know.
Best,
Zan
Diagnostic from one of the computers that I have used (I have observed the same error on three computers running either ME 0.5.1 or 0.5.2):
==========System==========
Linux-5.4.0-66-generic-x86_64-with-debian-buster-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
==========Pytorch==========
1.8.0
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 455.32.00
CUDA Version 11.1
VBIOS Version 88.00.41.00.18
Image Version G001.0000.01.04
==========NVCC==========
sh: 1: nvcc: not found
==========CC==========
CC=g++-7
/usr/bin/g++-7
g++-7 (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.1
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010
The text was updated successfully, but these errors were encountered: