-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use float64 add Atomics, Where Available #3172
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3172 +/- ##
=========================================
- Coverage 81.13% 81.1% -0.03%
=========================================
Files 384 386 +2
Lines 75077 76398 +1321
Branches 8434 8590 +156
=========================================
+ Hits 60915 61964 +1049
- Misses 12874 13114 +240
- Partials 1288 1320 +32 |
@@ -812,7 +820,7 @@ def _rebuild(cls, func_reduced, bind, targetoptions, config): | |||
def __reduce__(self): | |||
""" | |||
Reduce the instance for serialization. | |||
Compiled definitions are serialized in PTX form. | |||
Compiled definitions are discarded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! opened #3210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. Pending buildfarm to report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following tests are failing:
- test_polytyped (numba.cuda.tests.cudapy.test_inspect.TestInspect)
- test_debuginfo_in_asm (numba.cuda.tests.cudapy.test_debuginfo.TestCudaDebugInfo)
- test_environment_override (numba.cuda.tests.cudapy.test_debuginfo.TestCudaDebugInfo)
@sklam oops, fixed |
CAS operation work on bit types.
Compute Capability cards >= 6.0 support atomic double addition. Use this (instead of numba's CAS-spinning) if available).
@seibert you were right about numba caching LLVM IR globally in an
AutoJitCUDAKernel
- this needs to be per-compute capability as numba poly-fills functionality where not available.