Enzyme fails on GPU kernel #307

pxl-th · 2022-06-13T17:33:39Z

On CPU Enzyme works fine and is able to differentiate through spherical_harmonics! kernel, however on CUDADevice it fails with the error below.
Looks like it is missing something similar to transform_gpu! which inserts return nothing.

Error:

ERROR: LoadError: GPU compilation of kernel #df#1(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}) failed
KernelError: kernel returns a value of type `Tuple{}`

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.

MWE:

using InteractiveUtils
using CUDA
using CUDAKernels
using Enzyme
using KernelAbstractions
using KernelGradients

Base.rand(::CPU, T, shape) = rand(T, shape)
Base.rand(::CUDADevice, T, shape) = CUDA.rand(T, shape)
Base.zeros(::CPU, T, shape) = zeros(T, shape)
Base.zeros(::CUDADevice, T, shape) = CUDA.zeros(T, shape)
Base.ones(::CPU, T, shape) = ones(T, shape)
Base.ones(::CUDADevice, T, shape) = CUDA.ones(T, shape)

linear_threads(::CPU) = Threads.nthreads()
linear_threads(::CUDADevice) = 128

@kernel function spherical_harmonics!(encodings, @Const(directions))
    i = @index(Global)
    x = directions[1, i]
    y = directions[2, i]
    z = directions[3, i]

    encodings[1, i] = 0.28209479177387814f0
    encodings[2, i] = -0.48860251190291987f0 * y
    encodings[3, i] = 0.48860251190291987f0 * z
    encodings[4, i] = -0.48860251190291987f0 * x
end

function ∇spherical_harmonics!(∂encodings, ∂directions, encodings, directions, device)
    ∇k! = Enzyme.autodiff(spherical_harmonics!(device, linear_threads(device)))
    # @device_code dir="./" ∇k!(Duplicated(encodings, ∂encodings), Duplicated(directions, ∂directions); ndrange=1)
    n = size(encodings, 2)
    wait(∇k!(Duplicated(encodings, ∂encodings), Duplicated(directions, ∂directions); ndrange=n))
    nothing
end

function main()
    device = CUDADevice()
    n = 1
    x = rand(device, Float32, (3, n))
    y = zeros(device, Float32, (4, n))
    ∂L∂x = zeros(device, Float32, (3, n))
    ∂L∂y = ones(device, Float32, (4, n))

    wait(spherical_harmonics!(device, linear_threads(device))(y, x; ndrange=n))
    ∇spherical_harmonics!(∂L∂y, ∂L∂x, y, x, device)
end
main()

I'm on Julia 1.8.0-rc1.

]st:

  [052768ef] CUDA v3.10.1
  [72cfdca4] CUDAKernels v0.4.2 `https://github.com/JuliaGPU/KernelAbstractions.jl.git:lib/CUDAKernels#master`
  [7da242da] Enzyme v0.10.0
  [63c18a36] KernelAbstractions v0.8.2 `https://github.com/JuliaGPU/KernelAbstractions.jl.git#master`
  [e5faadeb] KernelGradients v0.1.1 `https://github.com/JuliaGPU/KernelAbstractions.jl.git:lib/KernelGradients#master`

The text was updated successfully, but these errors were encountered:

vchuravy · 2022-06-13T18:52:30Z

That seems to be an ABI issue...

vchuravy · 2022-06-13T18:53:42Z

Could you add a return nothing here?

KernelAbstractions.jl/lib/KernelGradients/src/KernelGradients.jl

Line 9 in d52a6f3

Enzyme.autodiff_deferred(f::Fun, Enzyme.Const, ctx, args...)

pxl-th · 2022-06-15T18:38:06Z

@vchuravy, I added return nothing, but now I get different error on CUDA (CPU works fine):

ERROR: LoadError: InvalidIRError: compiling kernel #df#1(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_getfield)
Stacktrace:
 [1] getindex
   @ ./tuple.jl:29
 [2] iterate
   @ ./tuple.jl:68
 [3] same_or_one
   @ ~/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:206
 [4] autodiff_deferred
   @ ~/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:429
 [5] df
   @ ~/.julia/dev/KernelAbstractions/lib/KernelGradients/src/KernelGradients.jl:9
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/validation.jl:139
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:414 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/jgSVI/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:412 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:354
  [7] #224
    @ ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/cache.jl:90
 [11] cufunction(f::KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:299
 [12] cufunction(f::KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}})
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:292
 [13] macro expansion
    @ ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:102 [inlined]
 [14] (::KernelAbstractions.Kernel{CUDADevice, KernelAbstractions.NDIteration.StaticSize{(512,)}, KernelAbstractions.NDIteration.DynamicSize, KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}})(::Duplicated{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ::Vararg{Duplicated{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Nothing, progress::Function)
    @ CUDAKernels ~/.julia/dev/KernelAbstractions/lib/CUDAKernels/src/CUDAKernels.jl:273
 [15] main()
    @ Main ~/code/a.jl:40
 [16] top-level scope
    @ ~/code/a.jl:42
in expression starting at /home/pxl-th/code/a.jl:42

pxl-th · 2022-06-16T04:03:31Z

I see, it is the same issue as in EnzymeAD/Enzyme.jl#358.

pxl-th · 2022-06-16T14:00:07Z

Here's @device_code if needed: device_code.zip

pxl-th · 2022-06-17T11:00:05Z

EnzymeAD/Enzyme.jl#361 & #309 fixed it.

pxl-th changed the title ~~Enzyme on GPU kernel~~ Enzyme fails on GPU kernel Jun 13, 2022

pxl-th mentioned this issue Jun 15, 2022

Add 'return nothing' to autodiff #309

Merged

vchuravy mentioned this issue Jun 16, 2022

Unsupported Val in CUDA kernel - Enzyme v0.10.0 - GPUCompiler v0.15 EnzymeAD/Enzyme.jl#358

Closed

pxl-th mentioned this issue Jun 16, 2022

Make 'same_or_one' type-stable EnzymeAD/Enzyme.jl#363

Closed

pxl-th closed this as completed Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enzyme fails on GPU kernel #307

Enzyme fails on GPU kernel #307

pxl-th commented Jun 13, 2022 •

edited

Loading

vchuravy commented Jun 13, 2022

vchuravy commented Jun 13, 2022

pxl-th commented Jun 15, 2022 •

edited

Loading

pxl-th commented Jun 16, 2022

pxl-th commented Jun 16, 2022

pxl-th commented Jun 17, 2022

Enzyme fails on GPU kernel #307

Enzyme fails on GPU kernel #307

Comments

pxl-th commented Jun 13, 2022 • edited Loading

vchuravy commented Jun 13, 2022

vchuravy commented Jun 13, 2022

pxl-th commented Jun 15, 2022 • edited Loading

pxl-th commented Jun 16, 2022

pxl-th commented Jun 16, 2022

pxl-th commented Jun 17, 2022

pxl-th commented Jun 13, 2022 •

edited

Loading

pxl-th commented Jun 15, 2022 •

edited

Loading