-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No augmented forward pass found for cuOccupancyMaxPotentialBlockSize
#1061
Comments
I think this is due to the EnzymeRules for KernelAbstractions not supporting reverse mode yet |
Oh, I see. I saw tests in KernelAbstractions for reverse mode and though that it works. |
the KA custom rule is implemented for any backend in forward mode, and the CPU backend in reverse |
I don't actually remember what was needed for reverse GPU support |
We needed to prexompite the GPU relevant/interpreted tape size from outside
the kernel.
So we need a variant of thunk tape computation that allows for a different
device
…On Mon, Sep 18, 2023 at 12:04 PM Valentin Churavy ***@***.***> wrote:
I don't actually remember what was needed for reverse GPU support
—
Reply to this email directly, view it on GitHub
<#1061 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXD7CG46R7XEC5NBCF3X3B5IRANCNFSM6AAAAAA44UTTAU>
.
You are receiving this because you commented.Message ID: <EnzymeAD/Enzyme.
***@***.***>
|
Actually, is this also the case if I want to differentiate just the kernel (no host code involved)? |
nope that would be fine |
I see there are tests for reverse for CUDA.jl: Line 14 in 7d99eec
But when I try the same with KA, it errors: ERROR: return type is Union{}, giving up.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] autodiff_deferred
@ Main ~/.julia/packages/Enzyme/0SYwj/src/Enzyme.jl:456 [inlined]
[3] autodiff_deferred
@ Main ~/.julia/packages/Enzyme/0SYwj/src/Enzyme.jl:442 [inlined]
[4] main2()
@ Main ~/code/t.jl:110
[5] top-level scope
@ REPL[3]:1
[6] top-level scope
@ ~/.julia/packages/CUDA/35NC6/src/initialization.jl:190 using CUDA
using KernelAbstractions
using Enzyme
@kernel function ker(x)
i = @index(Global)
x[i] *= x[i]
end
function main()
kab = CUDABackend()
x = KA.ones(kab, Float32, 16)
dx = KA.ones(kab, Float32, 16)
Enzyme.autodiff_deferred(Reverse, ker(kab), Duplicated(x, dx))
return
end
main() I'm probably doing things incorrectly, but I haven't found the example with KA with just a single kernel... :/ |
Actually, test for CUDA.jl also gives this error: function mul_kernel(A)
i = threadIdx().x
if i <= length(A)
A[i] *= A[i]
end
return nothing
end
function main()
A = CUDA.ones(64,)
dA = CUDA.ones(64,)
autodiff_deferred(Reverse, mul_kernel, Const, Duplicated(A, dA))
return
end I'm using CUDA 4.4.1, Enzyme 0.11.7 and Julia 1.10-beta2 |
So I got confused, but with CUDA.jl if you wrap in function mul_kernel(A)
i = threadIdx().x
A[i] *= A[i]
return nothing
end
function grad(A, dA)
autodiff_deferred(Reverse, mul_kernel, Duplicated(A, dA))
return nothing
end And call But with KernelAbstractions I cannot figure out how to do this. Is there a way to AD just the kernel? |
@wsmoses, sorry for spamming, but are there any examples with KA not involving host code (just the kernel)? |
You should be able to use autodiff_deferred inside the kernel itself (like your grad case). The KA example you showed is for the custom rules nicer support, but that's only enabled for forward mode in KA.jl rn. For reverse mode, you'll have to set it up manually like your mul_kernel above where the autodiff call is inside the device code entirely |
Oh, I see! Now it works! A note somewhere in the docs might be useful (unless I missed one). |
It works for the Error: ERROR: InvalidIRError: compiling MethodInstance for gpu_gker(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::AMDGPU.Device.ROCDeviceVector{Float32, 1}, ::AMDGPU.Device.ROCDeviceVector{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] #sin
@ ~/.julia/dev/AMDGPU/src/device/gcn/math.jl:32
[2] ker
@ ~/code/ZipNerf.jl/t.jl:7
[3] ker
@ ~/code/ZipNerf.jl/t.jl:0
[4] diffejulia_ker_5228_inner_1wrap
@ ~/code/ZipNerf.jl/t.jl:0
[5] macro expansion
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9774
[6] enzyme_call
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9452
[7] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9415
[8] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:372
[9] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:459
[10] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:442
[11] macro expansion
@ ~/code/ZipNerf.jl/t.jl:18
[12] gpu_gker
@ ~/.julia/packages/KernelAbstractions/cWlFz/src/macros.jl:90
[13] gpu_gker
@ ./none:0
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] #sin
@ ~/.julia/dev/AMDGPU/src/device/gcn/math.jl:32
[2] ker
@ ~/code/ZipNerf.jl/t.jl:7
[3] ker
@ ~/code/ZipNerf.jl/t.jl:0
[4] diffejulia_ker_5228_inner_1wrap
@ ~/code/ZipNerf.jl/t.jl:0
... Code: using AMDGPU
using KernelAbstractions
using Enzyme
import KernelAbstractions as KA
@inline function ker(x, i)
x[i] *= sin(x[i])
return
end
@kernel function fker(x)
i = @index(Global)
ker(x, i)
end
@kernel function gker(x, dx)
i = @index(Global)
Enzyme.autodiff_deferred(Reverse, ker, Duplicated(x, dx), i)
end
function main()
kab = ROCBackend()
x = KA.ones(kab, Float32, 16)
dx = KA.ones(kab, Float32, 16)
fker(kab)(x; ndrange=length(x))
@show x
gker(kab)(x, dx; ndrange=length(x))
@show dx
return
end |
Yeah that's the same as #683
…On Mon, Sep 25, 2023 at 9:13 AM Anton Smirnov ***@***.***> wrote:
It works for the mul_kernel, however fails when using with more complex
kernels.
For example, with sin function.
Error:
ERROR: InvalidIRError: compiling MethodInstance for gpu_gker(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::AMDGPU.Device.ROCDeviceVector{Float32, 1}, ::AMDGPU.Device.ROCDeviceVector{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] #sin
@ ~/.julia/dev/AMDGPU/src/device/gcn/math.jl:32
[2] ker
@ ~/code/ZipNerf.jl/t.jl:7
[3] ker
@ ~/code/ZipNerf.jl/t.jl:0
[4] diffejulia_ker_5228_inner_1wrap
@ ~/code/ZipNerf.jl/t.jl:0
[5] macro expansion
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9774
[6] enzyme_call
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9452
[7] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/VS5jo/src/compiler.jl:9415
[8] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:372
[9] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:459
[10] autodiff_deferred
@ ~/.julia/packages/Enzyme/VS5jo/src/Enzyme.jl:442
[11] macro expansion
@ ~/code/ZipNerf.jl/t.jl:18
[12] gpu_gker
@ ~/.julia/packages/KernelAbstractions/cWlFz/src/macros.jl:90
[13] gpu_gker
@ ./none:0
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
[1] #sin
@ ~/.julia/dev/AMDGPU/src/device/gcn/math.jl:32
[2] ker
@ ~/code/ZipNerf.jl/t.jl:7
[3] ker
@ ~/code/ZipNerf.jl/t.jl:0
[4] diffejulia_ker_5228_inner_1wrap
@ ~/code/ZipNerf.jl/t.jl:0...
Code:
using AMDGPUusing KernelAbstractionsusing Enzymeimport KernelAbstractions as KA
@inline function ker(x, i)
x[i] *= sin(x[i])
returnend
@kernel function fker(x)
i = @index(Global)
ker(x, i)end
@kernel function gker(x, dx)
i = @index(Global)
Enzyme.autodiff_deferred(Reverse, ker, Duplicated(x, dx), i)end
function main()
kab = ROCBackend()
x = KA.ones(kab, Float32, 16)
dx = KA.ones(kab, Float32, 16)
fker(kab)(x; ndrange=length(x))
@show x
gker(kab)(x, dx; ndrange=length(x))
@show dx
returnend
—
Reply to this email directly, view it on GitHub
<#1061 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXB5HYQPM4ZYTI45AS3X4GGSJANCNFSM6AAAAAA44UTTAU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Just curious if the fix is coming relatively soon or is it more involved? |
It's unfortunately more involved. @aviatesk do you have cycles to help us with the nested abstract interpreter issues? |
Hi!
I'm trying to use fused kernel
compute_α_fused
to compute alpha-composing weights and use Enzyme to generate gradient kernel inReverse
mode instead ofcompute_α
.But the compilation fails. Is this the issue with CUDA.jl?
Error:
Code:
The text was updated successfully, but these errors were encountered: