make compatible with CUDA kernels #114

piever · 2020-01-04T20:25:14Z

Builds on top of #87

codecov-io · 2020-01-04T20:50:32Z

Codecov Report

Merging #114 into master will decrease coverage by 2.52%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   93.79%   91.26%   -2.53%     
==========================================
  Files          10        9       -1     
  Lines         403      378      -25     
==========================================
- Hits          378      345      -33     
- Misses         25       33       +8

Impacted Files	Coverage Δ
src/StructArrays.jl	`100% <ø> (ø)`	⬆️
src/utils.jl	`92.98% <100%> (+0.91%)`	⬆️
src/compatibility.jl	`100% <100%> (ø)`
src/structarray.jl	`85.56% <66.66%> (-6.44%)`	⬇️
src/interface.jl	`83.33% <0%> (-16.67%)`	⬇️
src/groupjoin.jl	`86.66% <0%> (-4.25%)`	⬇️
src/collect.jl	`95.77% <0%> (-4.23%)`	⬇️
src/sort.jl	`92.59% <0%> (-1.39%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00e42b1...f38a469. Read the comment docs.

piever · 2020-01-04T21:23:49Z

CC: @vchuravy and @lcw (as you were both involved in #87), does this look fine?

The only thing I don't understand is whether there is a way to avoid a generated get_ith function (as the foreachfield change is overall good regardless of CUDA support so I'm already fine with it). It seems to me that the recommended Base.@_inline_meta suggestion of #87 (comment) still gives the "dynamic function" error. I also confess not being 100% sure of what Base.@_inline_meta actually does. EDIT: actually manually recursing on the tuple works, see https://github.com/JuliaArrays/StructArrays.jl/pull/114/files#diff-dc0cc5c34e65521b90994ce594f4dc2fR137.

Also, how do I test that the CUDAnative kernel actually works on Travis? Is it OK to just test the Adapt support there?

piever · 2020-01-04T22:01:25Z

The code I'm using for testing is:

using CUDAnative, CuArrays, StaticArrays, StructArrays
d = StructArray(a = rand(100), b = rand(100))

# To test nested case
# d = StructArray(a = StructArray(a = rand(100), b = rand(100)),
#                 b = StructArray(a = rand(100), b = rand(100)))

dd = replace_storage(CuArray, d)
de = similar(dd)

@show typeof(dd)

function kernel!(dest, src)
    i = (blockIdx().x-1)*blockDim().x + threadIdx().x
    if i <= length(dest)
        dest[i] = src[i]
    end
    return nothing
end

threads = 1024
blocks = cld(length(dd),threads)

@cuda threads=threads blocks=blocks kernel!(de, dd)

vchuravy · 2020-01-06T16:03:31Z

For CI, we do host a free Gitlab runner so that people can test their GPU enabled packages.
https://github.com/JuliaGPU/gitlab-ci

vchuravy · 2020-01-06T16:07:28Z

I also confess not being 100% sure of what Base.@_inline_meta actually does.

It forces the Julia compiler to inline your function (it is the equivalent to @inline for places where that macro doesn't work), even if the heuristic deems it not worth it. This can be rather important in ntuple since if you have a heterogenous tuple and we don't unroll and inline the closure you can get get a very broad tuple type.

EDIT: actually manually recursing on the tuple works

Yeah, tuple + recursive functions is something the compiler understands rather well.

lcw · 2020-01-06T18:05:02Z

Thanks @piever for taking this on! The code you are using to test works for me but the following code doesn't work

using CUDAnative, CuArrays, StaticArrays, StructArrays

c = [SHermitianCompact(@SVector(rand(3))) for i=1:5]
d = StructArray(c, unwrap = t -> t <: Union{SHermitianCompact,SVector,Tuple})

dd = replace_storage(CuArray, d)
de = similar(dd)

@show typeof(dd)

function kernel!(dest, src)
    i = (blockIdx().x-1)*blockDim().x + threadIdx().x
    if i <= length(dest)
        dest[i] = src[i]
    end
    return nothing
end

threads = 1024
blocks = cld(length(dd),threads)

@cuda threads=threads blocks=blocks kernel!(de, dd)

Is there something I am doing wrong when creating the StructArray? Or is this struct just more nested so that it is causing foreachfield to be a dynamic function?

This uses a generated function to avoid a dynamic call for `getindex` allowing it to be called in a CUDAnative kernel.

This allows `setindex!` on `StructArray`s to be used in `CUDAnative` kernels.

piever · 2020-02-14T12:31:58Z

Is there something I am doing wrong when creating the StructArray? Or is this struct just more nested so that it is causing foreachfield to be a dynamic function?

I think something like SHermitianCompact is a bit problematic, as normally StructArrays assume that getproperty reflects of the struct reflects the various fields. Maybe try and check https://github.com/JuliaArrays/StructArrays.jl#advanced-structures-with-non-standard-data-layout, but I think that should be figured out in a separate PR.

piever force-pushed the pv/gpu branch from 3d2467a to 4643be9 Compare January 4, 2020 21:24

lcw mentioned this pull request Jan 6, 2020

Allow StructArray arguments in CUDAnative kernels. #87

Closed

lcw and others added 8 commits February 14, 2020 12:02

Avoid dynamic code in get_ith

9e82218

This uses a generated function to avoid a dynamic call for `getindex` allowing it to be called in a CUDAnative kernel.

Make foreachfield a static function

695fbfc

This allows `setindex!` on `StructArray`s to be used in `CUDAnative` kernels.

Use Adapt for converting CPU to GPU StructArrays

3bea617

make getindex generated and revert foreach changes

189afa3

simpler foreachfield

166d15d

restore foreachfield for an arbitrary number of variables

ba63f3a

remove generated get_ith

f9c4c96

test adapt

ce15a14

piever force-pushed the pv/gpu branch from 7a024c8 to ce15a14 Compare February 14, 2020 12:10

Pietro Vertechi added 2 commits February 14, 2020 12:11

remove extra file

faa0f76

fix rebasing issues

f38a469

piever changed the title ~~WIP: try and make compatible with CUDA kernels~~ make compatible with CUDA kernels Feb 14, 2020

piever merged commit ace794f into master Feb 14, 2020

piever deleted the pv/gpu branch February 14, 2020 12:33

lcw mentioned this pull request Apr 21, 2020

Make a package and add CUDA support JuliaArrays/StructsOfArrays.jl#17

Open

piever mentioned this pull request Dec 15, 2020

Handling nested structs #85

Open

piever mentioned this pull request Nov 30, 2023

optimize get_ith #286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make compatible with CUDA kernels #114

make compatible with CUDA kernels #114

piever commented Jan 4, 2020

codecov-io commented Jan 4, 2020 •

edited

Loading

piever commented Jan 4, 2020 •

edited

Loading

piever commented Jan 4, 2020 •

edited

Loading

vchuravy commented Jan 6, 2020

vchuravy commented Jan 6, 2020

lcw commented Jan 6, 2020

piever commented Feb 14, 2020

make compatible with CUDA kernels #114

make compatible with CUDA kernels #114

Conversation

piever commented Jan 4, 2020

codecov-io commented Jan 4, 2020 • edited Loading

Codecov Report

piever commented Jan 4, 2020 • edited Loading

piever commented Jan 4, 2020 • edited Loading

vchuravy commented Jan 6, 2020

vchuravy commented Jan 6, 2020

lcw commented Jan 6, 2020

piever commented Feb 14, 2020

codecov-io commented Jan 4, 2020 •

edited

Loading

piever commented Jan 4, 2020 •

edited

Loading

piever commented Jan 4, 2020 •

edited

Loading