Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make compatible with CUDA kernels #114

Merged
merged 10 commits into from
Feb 14, 2020
Merged

make compatible with CUDA kernels #114

merged 10 commits into from
Feb 14, 2020

Conversation

piever
Copy link
Collaborator

@piever piever commented Jan 4, 2020

Builds on top of #87

@codecov-io
Copy link

codecov-io commented Jan 4, 2020

Codecov Report

Merging #114 into master will decrease coverage by 2.52%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   93.79%   91.26%   -2.53%     
==========================================
  Files          10        9       -1     
  Lines         403      378      -25     
==========================================
- Hits          378      345      -33     
- Misses         25       33       +8
Impacted Files Coverage Δ
src/StructArrays.jl 100% <ø> (ø) ⬆️
src/utils.jl 92.98% <100%> (+0.91%) ⬆️
src/compatibility.jl 100% <100%> (ø)
src/structarray.jl 85.56% <66.66%> (-6.44%) ⬇️
src/interface.jl 83.33% <0%> (-16.67%) ⬇️
src/groupjoin.jl 86.66% <0%> (-4.25%) ⬇️
src/collect.jl 95.77% <0%> (-4.23%) ⬇️
src/sort.jl 92.59% <0%> (-1.39%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00e42b1...f38a469. Read the comment docs.

@piever
Copy link
Collaborator Author

piever commented Jan 4, 2020

CC: @vchuravy and @lcw (as you were both involved in #87), does this look fine?

The only thing I don't understand is whether there is a way to avoid a generated get_ith function (as the foreachfield change is overall good regardless of CUDA support so I'm already fine with it). It seems to me that the recommended Base.@_inline_meta suggestion of #87 (comment) still gives the "dynamic function" error. I also confess not being 100% sure of what Base.@_inline_meta actually does. EDIT: actually manually recursing on the tuple works, see https://github.com/JuliaArrays/StructArrays.jl/pull/114/files#diff-dc0cc5c34e65521b90994ce594f4dc2fR137.

Also, how do I test that the CUDAnative kernel actually works on Travis? Is it OK to just test the Adapt support there?

@piever
Copy link
Collaborator Author

piever commented Jan 4, 2020

The code I'm using for testing is:

using CUDAnative, CuArrays, StaticArrays, StructArrays
d = StructArray(a = rand(100), b = rand(100))

# To test nested case
# d = StructArray(a = StructArray(a = rand(100), b = rand(100)),
#                 b = StructArray(a = rand(100), b = rand(100)))

dd = replace_storage(CuArray, d)
de = similar(dd)

@show typeof(dd)

function kernel!(dest, src)
    i = (blockIdx().x-1)*blockDim().x + threadIdx().x
    if i <= length(dest)
        dest[i] = src[i]
    end
    return nothing
end

threads = 1024
blocks = cld(length(dd),threads)

@cuda threads=threads blocks=blocks kernel!(de, dd)

@vchuravy
Copy link

vchuravy commented Jan 6, 2020

For CI, we do host a free Gitlab runner so that people can test their GPU enabled packages.
https://github.com/JuliaGPU/gitlab-ci

@vchuravy
Copy link

vchuravy commented Jan 6, 2020

I also confess not being 100% sure of what Base.@_inline_meta actually does.

It forces the Julia compiler to inline your function (it is the equivalent to @inline for places where that macro doesn't work), even if the heuristic deems it not worth it. This can be rather important in ntuple since if you have a heterogenous tuple and we don't unroll and inline the closure you can get get a very broad tuple type.

EDIT: actually manually recursing on the tuple works

Yeah, tuple + recursive functions is something the compiler understands rather well.

@lcw
Copy link
Contributor

lcw commented Jan 6, 2020

Thanks @piever for taking this on! The code you are using to test works for me but the following code doesn't work

using CUDAnative, CuArrays, StaticArrays, StructArrays

c = [SHermitianCompact(@SVector(rand(3))) for i=1:5]
d = StructArray(c, unwrap = t -> t <: Union{SHermitianCompact,SVector,Tuple})

dd = replace_storage(CuArray, d)
de = similar(dd)

@show typeof(dd)

function kernel!(dest, src)
    i = (blockIdx().x-1)*blockDim().x + threadIdx().x
    if i <= length(dest)
        dest[i] = src[i]
    end
    return nothing
end

threads = 1024
blocks = cld(length(dd),threads)

@cuda threads=threads blocks=blocks kernel!(de, dd)

Is there something I am doing wrong when creating the StructArray? Or is this struct just more nested so that it is causing foreachfield to be a dynamic function?

lcw and others added 8 commits February 14, 2020 12:02
@piever
Copy link
Collaborator Author

piever commented Feb 14, 2020

Is there something I am doing wrong when creating the StructArray? Or is this struct just more nested so that it is causing foreachfield to be a dynamic function?

I think something like SHermitianCompact is a bit problematic, as normally StructArrays assume that getproperty reflects of the struct reflects the various fields. Maybe try and check https://github.com/JuliaArrays/StructArrays.jl#advanced-structures-with-non-standard-data-layout, but I think that should be figured out in a separate PR.

@piever piever changed the title WIP: try and make compatible with CUDA kernels make compatible with CUDA kernels Feb 14, 2020
@piever piever merged commit ace794f into master Feb 14, 2020
@piever piever deleted the pv/gpu branch February 14, 2020 12:33
@piever piever mentioned this pull request Dec 15, 2020
@piever piever mentioned this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants