Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme fails on GPU kernel #307

Closed
pxl-th opened this issue Jun 13, 2022 · 6 comments
Closed

Enzyme fails on GPU kernel #307

pxl-th opened this issue Jun 13, 2022 · 6 comments

Comments

@pxl-th
Copy link
Collaborator

pxl-th commented Jun 13, 2022

On CPU Enzyme works fine and is able to differentiate through spherical_harmonics! kernel, however on CUDADevice it fails with the error below.
Looks like it is missing something similar to transform_gpu! which inserts return nothing.

Error:

ERROR: LoadError: GPU compilation of kernel #df#1(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}) failed
KernelError: kernel returns a value of type `Tuple{}`

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.

MWE:

using InteractiveUtils
using CUDA
using CUDAKernels
using Enzyme
using KernelAbstractions
using KernelGradients

Base.rand(::CPU, T, shape) = rand(T, shape)
Base.rand(::CUDADevice, T, shape) = CUDA.rand(T, shape)
Base.zeros(::CPU, T, shape) = zeros(T, shape)
Base.zeros(::CUDADevice, T, shape) = CUDA.zeros(T, shape)
Base.ones(::CPU, T, shape) = ones(T, shape)
Base.ones(::CUDADevice, T, shape) = CUDA.ones(T, shape)

linear_threads(::CPU) = Threads.nthreads()
linear_threads(::CUDADevice) = 128

@kernel function spherical_harmonics!(encodings, @Const(directions))
    i = @index(Global)
    x = directions[1, i]
    y = directions[2, i]
    z = directions[3, i]

    encodings[1, i] = 0.28209479177387814f0
    encodings[2, i] = -0.48860251190291987f0 * y
    encodings[3, i] = 0.48860251190291987f0 * z
    encodings[4, i] = -0.48860251190291987f0 * x
end

function ∇spherical_harmonics!(∂encodings, ∂directions, encodings, directions, device)
    ∇k! = Enzyme.autodiff(spherical_harmonics!(device, linear_threads(device)))
    # @device_code dir="./" ∇k!(Duplicated(encodings, ∂encodings), Duplicated(directions, ∂directions); ndrange=1)
    n = size(encodings, 2)
    wait(∇k!(Duplicated(encodings, ∂encodings), Duplicated(directions, ∂directions); ndrange=n))
    nothing
end

function main()
    device = CUDADevice()
    n = 1
    x = rand(device, Float32, (3, n))
    y = zeros(device, Float32, (4, n))
    ∂L∂x = zeros(device, Float32, (3, n))
    ∂L∂y = ones(device, Float32, (4, n))

    wait(spherical_harmonics!(device, linear_threads(device))(y, x; ndrange=n))
    ∇spherical_harmonics!(∂L∂y, ∂L∂x, y, x, device)
end
main()

I'm on Julia 1.8.0-rc1.

]st:

  [052768ef] CUDA v3.10.1
  [72cfdca4] CUDAKernels v0.4.2 `https://github.com/JuliaGPU/KernelAbstractions.jl.git:lib/CUDAKernels#master`
  [7da242da] Enzyme v0.10.0
  [63c18a36] KernelAbstractions v0.8.2 `https://github.com/JuliaGPU/KernelAbstractions.jl.git#master`
  [e5faadeb] KernelGradients v0.1.1 `https://github.com/JuliaGPU/KernelAbstractions.jl.git:lib/KernelGradients#master`
@pxl-th pxl-th changed the title Enzyme on GPU kernel Enzyme fails on GPU kernel Jun 13, 2022
@vchuravy
Copy link
Member

That seems to be an ABI issue...

@vchuravy
Copy link
Member

Could you add a return nothing here?

Enzyme.autodiff_deferred(f::Fun, Enzyme.Const, ctx, args...)

@pxl-th
Copy link
Collaborator Author

pxl-th commented Jun 15, 2022

@vchuravy, I added return nothing, but now I get different error on CUDA (CPU works fine):

ERROR: LoadError: InvalidIRError: compiling kernel #df#1(KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to jl_f_getfield)
Stacktrace:
 [1] getindex
   @ ./tuple.jl:29
 [2] iterate
   @ ./tuple.jl:68
 [3] same_or_one
   @ ~/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:206
 [4] autodiff_deferred
   @ ~/.julia/packages/Enzyme/7MHm8/src/Enzyme.jl:429
 [5] df
   @ ~/.julia/dev/KernelAbstractions/lib/KernelGradients/src/KernelGradients.jl:9
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/validation.jl:139
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:414 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/jgSVI/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:412 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/utils.jl:64
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:354
  [7] #224
    @ ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/driver.jl:74
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wK8OU/src/cache.jl:90
 [11] cufunction(f::KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:299
 [12] cufunction(f::KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}, tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(512,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, Duplicated{CuDeviceMatrix{Float32, 1}}, Duplicated{CuDeviceMatrix{Float32, 1}}}})
    @ CUDA ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:292
 [13] macro expansion
    @ ~/.julia/packages/CUDA/tTK8Y/src/compiler/execution.jl:102 [inlined]
 [14] (::KernelAbstractions.Kernel{CUDADevice, KernelAbstractions.NDIteration.StaticSize{(512,)}, KernelAbstractions.NDIteration.DynamicSize, KernelGradients.var"#df#1"{typeof(gpu_spherical_harmonics!), typeof(gpu_spherical_harmonics!)}})(::Duplicated{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ::Vararg{Duplicated{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Nothing, progress::Function)
    @ CUDAKernels ~/.julia/dev/KernelAbstractions/lib/CUDAKernels/src/CUDAKernels.jl:273
 [15] main()
    @ Main ~/code/a.jl:40
 [16] top-level scope
    @ ~/code/a.jl:42
in expression starting at /home/pxl-th/code/a.jl:42

@pxl-th
Copy link
Collaborator Author

pxl-th commented Jun 16, 2022

I see, it is the same issue as in EnzymeAD/Enzyme.jl#358.

@pxl-th
Copy link
Collaborator Author

pxl-th commented Jun 16, 2022

Here's @device_code if needed: device_code.zip

@pxl-th
Copy link
Collaborator Author

pxl-th commented Jun 17, 2022

EnzymeAD/Enzyme.jl#361 & #309 fixed it.

@pxl-th pxl-th closed this as completed Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants