-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of cuda kernels as pullback functions #1114
base: master
Are you sure you want to change the base?
Add support of cuda kernels as pullback functions #1114
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1114 +/- ##
==========================================
+ Coverage 94.36% 94.38% +0.01%
==========================================
Files 50 50
Lines 8365 8744 +379
==========================================
+ Hits 7894 8253 +359
- Misses 471 491 +20
... and 26 files with indirect coverage changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
22399c4
to
ed7194c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
ed7194c
to
96ad6a1
Compare
96ad6a1
to
e5fa81d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
test/CUDA/GradientKernels.cu
Outdated
//CHECK-NEXT: cudaMemcpyKind _r1 = static_cast<cudaMemcpyKind>(0U); | ||
//CHECK-NEXT: clad::custom_derivatives::cudaMemcpy_pullback(out_host, out, 10 * sizeof(double), cudaMemcpyDeviceToHost, _d_out_host, _d_out, &_r0, &_r1); | ||
//CHECK-NEXT: } | ||
//CHECK-NEXT: kernel_call_pullback<<<1, 10>>>(out, in, _d_out, _d_in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call cudaDeviceSynchronize
before cudaFree
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be executed in order so no need
Edit: meaning the previous GPU ops will finish before free is executed as they are all executed in the default stream. Maybe it would be worth it to add them in case the user uses another stream for the kernel, as they should also include cudaDeviceSynchronize()
in the original kernel. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning the previous GPU ops will finish before free is executed as they are all executed in the default stream.
What if there are no cudaFree
calls? Should we call cudaDeviceSynchronize
in that case?
Maybe it would be worth it to add them in case the user uses another stream for the kernel, as they should also include cudaDeviceSynchronize() in the original kernel. What do you think?
I believe we should not add additional cudaDeviceSynchronize
calls because of multiple streams currently. We can add support for multiple stream by adding extra synchronization calls if the need arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supported for derivative kernel execution though. I can remove the cudaDeviceSynchronize
calls from cudaMemcpy_pullback
, but I think it doesn't hurt to keep them, as they will be added in the future either way.
clang-tidy review says "All clean, LGTM! 👍" |
…y_pullback to mimic its blocking behavior
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
No description provided.