Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use float64 add Atomics, Where Available #3172

Merged
merged 3 commits into from
Sep 6, 2018

Conversation

njwhite
Copy link
Contributor

@njwhite njwhite commented Jul 26, 2018

Compute Capability cards >= 6.0 support atomic double addition. Use this (instead of numba's CAS-spinning) if available).

@seibert you were right about numba caching LLVM IR globally in an AutoJitCUDAKernel - this needs to be per-compute capability as numba poly-fills functionality where not available.

@codecov-io
Copy link

codecov-io commented Jul 27, 2018

Codecov Report

Merging #3172 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #3172      +/-   ##
=========================================
- Coverage   81.13%   81.1%   -0.03%     
=========================================
  Files         384     386       +2     
  Lines       75077   76398    +1321     
  Branches     8434    8590     +156     
=========================================
+ Hits        60915   61964    +1049     
- Misses      12874   13114     +240     
- Partials     1288    1320      +32

@stuartarchibald stuartarchibald added the CUDA CUDA related issue/PR label Jul 30, 2018
@seibert seibert requested a review from sklam July 31, 2018 16:36
@@ -812,7 +820,7 @@ def _rebuild(cls, func_reduced, bind, targetoptions, config):
def __reduce__(self):
"""
Reduce the instance for serialization.
Compiled definitions are serialized in PTX form.
Compiled definitions are discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! opened #3210

@sklam sklam added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Aug 9, 2018
Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. Pending buildfarm to report back.

Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following tests are failing:

  • test_polytyped (numba.cuda.tests.cudapy.test_inspect.TestInspect)
  • test_debuginfo_in_asm (numba.cuda.tests.cudapy.test_debuginfo.TestCudaDebugInfo)
  • test_environment_override (numba.cuda.tests.cudapy.test_debuginfo.TestCudaDebugInfo)

@njwhite
Copy link
Contributor Author

njwhite commented Aug 10, 2018

@sklam oops, fixed

CAS operation work on bit types.
@stuartarchibald stuartarchibald added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Sep 3, 2018
@stuartarchibald
Copy link
Contributor

@sklam I pushed 9575027 merged to master through the smoke test, it passed.

@seibert seibert merged commit 62ec322 into numba:master Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants