test_api_gpu fails every time with CUDA_EXCEPTION_15 #2

sdorminey · 2018-05-14T03:51:50Z

test_api_gpu dies for me, every time, with Invalid Managed Memory Access, while evaluating the Nand gate (before bootstrapping occurs.) It looks like this code is running on the Host Thread, but the underlying data (in the Unified Memory) is mapped to the GPU, causing an error.

Would love a workaround, since this project looks really neat! Let me know if you need more info.

System setup:

Ubuntu LTS 16.04
NVidia drivers, 390.48
CUDA Toolkit, v7.5
NVidia GeForce 940M (Compute Capability 5.0)

Output:
------ Key Generation ------
------ Test Encryption/Decryption ------
Number of tests: 96
PASS
------ Initilizating Data on GPU(s) ------
------ Test NAND Gate ------
Number of tests: 96
(crashes here)

Stack trace:
Thread [1] 14501 [core: 2] (Suspended : Signal : CUDA_EXCEPTION_15:Invalid Managed Memory Access)
cufhe::Nand() at cufhe_gates_gpu.cu:50 0x7ffff7b18223
main() at test_api_gpu.cu:116 0x4048c1

WeiDaiWD · 2018-05-14T05:06:58Z

Thank you very much for your report. I think that I have found the reason of this crash.

Since we are launching several NAND gates concurrently on a single device, while one NAND gate is running a kernel that accesses some unified memory, another NAND gate accesses some other unified memory from the host. This is not allowed on devices with compute capability < 6.x: Unified memory coherency and concurrency.

I am working on a work-around solution that allocates both host and device memory and transfers data when needed. Hopefully I will posted it tomorrow when the new fix passes on a Titan X (compute capability 5.2).

Only if that does not work on your device, then I need more info on your side. Thanks again.

WeiDaiWD · 2018-05-14T07:52:39Z

OK, this is not a perfect fix. Please try to compile/run the code in New Branch. This new fix does not use unified memory. I see no crash on a Titan X. Let me know if it still does not work for your system.

Ironically, I now see a new issue which is the reason why I didn't merge it to master. After the fix, less than 0.5% of gates gives wrong result. I do not have much a clue here. It could be the problem of using pinned memory. I will have to test with page-locked memory and see. If you have some idea about this, please shine some light here. I would very much appreciate that.

WeiDaiWD · 2018-05-14T21:07:01Z

I have temporarily created another branch for pre-Pascal GPUs. Performance is much slower since I have to disable concurrent launching of kernels for now. The results are correct and safe to play with. I am working on the perfect cure now.

sdorminey · 2018-05-15T03:11:45Z

Awesome! test_api_gpu now succeeds, and I get ~22ms per gate, testing with both hot-fix and older_than_6.0_no_concurrency. Thank you for the speedy workaround!

I'm going to play with the python bindings next - I'll let you know if I run into any issues.

WeiDaiWD closed this as completed Oct 19, 2018

msoos mentioned this issue Nov 30, 2020

Getting a SIGBUS on the cluster nicolasprevot/GpuShareSat#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_api_gpu fails every time with CUDA_EXCEPTION_15 #2

test_api_gpu fails every time with CUDA_EXCEPTION_15 #2

sdorminey commented May 14, 2018

WeiDaiWD commented May 14, 2018 •

edited

Loading

WeiDaiWD commented May 14, 2018

WeiDaiWD commented May 14, 2018

sdorminey commented May 15, 2018

test_api_gpu fails every time with CUDA_EXCEPTION_15 #2

test_api_gpu fails every time with CUDA_EXCEPTION_15 #2

Comments

sdorminey commented May 14, 2018

WeiDaiWD commented May 14, 2018 • edited Loading

WeiDaiWD commented May 14, 2018

WeiDaiWD commented May 14, 2018

sdorminey commented May 15, 2018

WeiDaiWD commented May 14, 2018 •

edited

Loading