-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make compatible with CUDA kernels #114
Conversation
Codecov Report
@@ Coverage Diff @@
## master #114 +/- ##
==========================================
- Coverage 93.79% 91.26% -2.53%
==========================================
Files 10 9 -1
Lines 403 378 -25
==========================================
- Hits 378 345 -33
- Misses 25 33 +8
Continue to review full report at Codecov.
|
CC: @vchuravy and @lcw (as you were both involved in #87), does this look fine? The only thing I don't understand is whether there is a way to avoid a generated Also, how do I test that the CUDAnative kernel actually works on Travis? Is it OK to just test the Adapt support there? |
The code I'm using for testing is: using CUDAnative, CuArrays, StaticArrays, StructArrays
d = StructArray(a = rand(100), b = rand(100))
# To test nested case
# d = StructArray(a = StructArray(a = rand(100), b = rand(100)),
# b = StructArray(a = rand(100), b = rand(100)))
dd = replace_storage(CuArray, d)
de = similar(dd)
@show typeof(dd)
function kernel!(dest, src)
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
if i <= length(dest)
dest[i] = src[i]
end
return nothing
end
threads = 1024
blocks = cld(length(dd),threads)
@cuda threads=threads blocks=blocks kernel!(de, dd) |
For CI, we do host a free Gitlab runner so that people can test their GPU enabled packages. |
It forces the Julia compiler to inline your function (it is the equivalent to
Yeah, tuple + recursive functions is something the compiler understands rather well. |
Thanks @piever for taking this on! The code you are using to test works for me but the following code doesn't work using CUDAnative, CuArrays, StaticArrays, StructArrays
c = [SHermitianCompact(@SVector(rand(3))) for i=1:5]
d = StructArray(c, unwrap = t -> t <: Union{SHermitianCompact,SVector,Tuple})
dd = replace_storage(CuArray, d)
de = similar(dd)
@show typeof(dd)
function kernel!(dest, src)
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
if i <= length(dest)
dest[i] = src[i]
end
return nothing
end
threads = 1024
blocks = cld(length(dd),threads)
@cuda threads=threads blocks=blocks kernel!(de, dd) Is there something I am doing wrong when creating the |
This uses a generated function to avoid a dynamic call for `getindex` allowing it to be called in a CUDAnative kernel.
This allows `setindex!` on `StructArray`s to be used in `CUDAnative` kernels.
I think something like |
Builds on top of #87