-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate GPU Error Checking Function #350
Conversation
5d4941e
to
42d17ef
Compare
Consolodate all GPU error checking functions and macros into one overloaded function with one overload for CUDA/HIP errors and one for CUFFT/HIPFFT errors. That one doesn't use any macros and supports all the usual usage modes. It does utilize the `experimental::source_location` class. That class is supported on all compilers that we use or expect others to use but if it doesn't work for you then commenting out the relevant lines should be sufficient. Replaced all calls to `CHECK`, `CudaSafeCall`, `CudaCheckError`, and `gpErrchk` with `GPU_Error_Check`.
Some already had error checks on the next line but now they're all wrapped as is standard in the rest of the code
The CUDA_ERROR_CHECK macro that turns on error checking has been deprecated in favor of the new DISABLE_GPU_ERROR_CHECKING macro which disable error checking. Error checking is now on by default unless compiled with the DISABLE_GPU_ERROR_CHECKING macro.
42d17ef
to
cdbed5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've gone through -- it looks good to me.
I assume that you have confirmed that any of the implicit grid-synchronizations performed by the old error checking macros were entirely unnecessary, right? If so, we're good to merge.
Yep, everywhere that the implicit sync seemed like it might be relevant were on functions that already contain an implicit sync (like moving or allocating). Since we only use 1 GPU stream there's an implicit sync between all kernels. |
GPU Error Checking
I replaced the 3 macros and 7 functions for GPU error checking with a single overloaded function; one overload for CUDA/HIP checking and one for CUFFT/HIPFFT checking. The function supports wrapping a CUDA call or being called with no arguments to check the latest error.
I also added error checking to some
cudaMallocs
that were missing them or used them in a non-standard way.The other major change is the deprecation of the
CUDA_ERROR_CHECK
macro. Now error checking is on by default and can be disabled with the newDISABLE_GPU_ERROR_CHECKING
macro.This should resolve Issue #286 and possibly #296 as well, subject to discussion in that issue.