-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_api_gpu fails every time with CUDA_EXCEPTION_15 #2
Comments
Thank you very much for your report. I think that I have found the reason of this crash. Since we are launching several NAND gates concurrently on a single device, while one NAND gate is running a kernel that accesses some unified memory, another NAND gate accesses some other unified memory from the host. This is not allowed on devices with compute capability < 6.x: Unified memory coherency and concurrency. I am working on a work-around solution that allocates both host and device memory and transfers data when needed. Hopefully I will posted it tomorrow when the new fix passes on a Titan X (compute capability 5.2). Only if that does not work on your device, then I need more info on your side. Thanks again. |
OK, this is not a perfect fix. Please try to compile/run the code in New Branch. This new fix does not use unified memory. I see no crash on a Titan X. Let me know if it still does not work for your system. Ironically, I now see a new issue which is the reason why I didn't merge it to master. After the fix, less than 0.5% of gates gives wrong result. I do not have much a clue here. It could be the problem of using pinned memory. I will have to test with page-locked memory and see. If you have some idea about this, please shine some light here. I would very much appreciate that. |
I have temporarily created another branch for pre-Pascal GPUs. Performance is much slower since I have to disable concurrent launching of kernels for now. The results are correct and safe to play with. I am working on the perfect cure now. |
Awesome! test_api_gpu now succeeds, and I get ~22ms per gate, testing with both hot-fix and older_than_6.0_no_concurrency. Thank you for the speedy workaround! I'm going to play with the python bindings next - I'll let you know if I run into any issues. |
test_api_gpu dies for me, every time, with Invalid Managed Memory Access, while evaluating the Nand gate (before bootstrapping occurs.) It looks like this code is running on the Host Thread, but the underlying data (in the Unified Memory) is mapped to the GPU, causing an error.
Would love a workaround, since this project looks really neat! Let me know if you need more info.
System setup:
Output:
------ Key Generation ------
------ Test Encryption/Decryption ------
Number of tests: 96
PASS
------ Initilizating Data on GPU(s) ------
------ Test NAND Gate ------
Number of tests: 96
(crashes here)
Stack trace:
Thread [1] 14501 [core: 2] (Suspended : Signal : CUDA_EXCEPTION_15:Invalid Managed Memory Access)
cufhe::Nand() at cufhe_gates_gpu.cu:50 0x7ffff7b18223
main() at test_api_gpu.cu:116 0x4048c1
The text was updated successfully, but these errors were encountered: