The `chunk` function is not differentiable on GPU #170

bicycle1885 · 2023-11-07T07:34:55Z

I found that operations involving the chunk function are not differentiable on GPU.

using CUDA, Flux

struct Model
    layers
end

function Model()
    dense = Dense(3 => 8)
    Model((;dense))
end

Flux.@functor Model

function (model::Model)(x)
    y = model.layers.dense(x)
    a, b = Flux.chunk(y, size = [4, 4], dims = 1)
    sum(a + b)
end

model = Model()
x = randn(Float32, 3, 10)

x, model = gpu((x, model))
@show model(x)
Flux.withgradient(model -> model(x), model)

When I try to run this, I see the following error:

ERROR: LoadError: MethodError: no method matching parent(::Type{SubArray{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}, 0, Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, Tuple{Int64}, true}})

Closest candidates are:
  parent(!Matched::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/adjtrans.jl:341
  parent(!Matched::Union{LinearAlgebra.LowerTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UnitLowerTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UnitUpperTriangular{T, S} where S<:AbstractMatrix{T}, LinearAlgebra.UpperTriangular{T, S} where S<:AbstractMatrix{T}} where T)
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/triangular.jl:164
  parent(!Matched::Union{LinearAlgebra.Hermitian{T, S}, LinearAlgebra.Symmetric{T, S}} where {T, S})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/symmetric.jl:275
  ...

Full error message: log.txt

My environment is:

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: ]Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 16 virtual cores
Environment:
  JULIA_PROJECT = @.

(tmp) pkg> st
Status `~/tmp/Project.toml`
  [052768ef] CUDA v5.0.0
  [587475ba] Flux v0.14.6
  [02a925ec] cuDNN v1.2.0

Manifest.toml and Project.toml are the followings (the file name extensions are replaced for uploading).
Manifest.txt
Project.txt

The text was updated successfully, but these errors were encountered:

bicycle1885 · 2023-11-07T07:58:31Z

I found that this might be caused by Zygote.jl 0.6.67 because the problem goes away when I downgrade Zygote.jl to 0.6.66.

mcabbott · 2023-11-07T20:14:58Z

This looks like a bug in the CR rrule being used here, after Zygote deleted its rule. Any chance you can isolate it further, e.g. to a single getindex call which gives a similar error?

bicycle1885 · 2023-11-07T20:23:18Z

I'm not sure what you expect. Can you elaborate on this? Then, I'll test it soon.

to a single getindex call which gives a similar error?

mcabbott · 2023-11-08T16:27:07Z

Sorry, what I mean is that chunk must end up doing some indexing, maybe like x[:,1], which uses the rule for getindex, which shows up in the stacktrace. I think the error can probably be reproduced by something like gradient(x -> sum(abs2, x[:, 2:3]), cu(rand(2,3))) but I'm not sure what the indices needed are.

This is the relevant bit of the stacktrace:

  [4] materialize!
    @ ~/.julia/packages/GPUArrays/dAUOE/src/host/broadcast.jl:46 [inlined]
  [5] materialize!
    @ ./broadcast.jl:881 [inlined]
  [6] ∇getindex!(dx::Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, dy::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, inds::Int64)
    @ ChainRules ~/.julia/packages/ChainRules/Tvwnx/src/rulesets/Base/indexing.jl:147
  [7] ∇getindex(x::Vector{SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}}, dy::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, inds::Int64)
    @ ChainRules ~/.julia/packages/ChainRules/Tvwnx/src/rulesets/Base/indexing.jl:89

Here Vector{SubArray{..., CuArray means perhaps something like x = collect(eachcol(cu(rand(2,3))) is being indexed.

bicycle1885 · 2023-11-08T22:30:31Z

Thanks. I've test two other cases that do more direct indexing without calling chunk. One is a, b = y[1:4,:], y[5:8,:] and the other is a, b = view(y, 1:4, :), view(y, 5:8, :), and I confirmed that both of them work without any error on GPU.

function (model::Model)(x)
    y = model.layers.dense(x)

    # ERROR: LoadError: MethodError: no method matching parent(...
    #a, b = Flux.chunk(y, size = [4, 4], dims = 1)

    # this works
    #a, b = y[1:4,:], y[5:8,:]

    # this works
    a, b = view(y, 1:4, :), view(y, 5:8, :)

    sum(a + b)
end

bicycle1885 · 2023-11-08T22:34:37Z

I discovered that the following pattern doesn't work. I guess the lowering implicitly inserts some getindex calls hindering differentiation.

    a, b = [y[1:4,:], y[5:8,:]]

bicycle1885 · 2023-11-08T22:45:17Z

So, I reduced the code to the following. As in the case above, the error disappears if I use Zygote.jl 0.6.66 instead of 0.6.67.

using CUDA, Zygote

function f(x)
    a, b = [x[1:4], x[5:8]]
    sum(a + b)
end

x = cu(randn(8))
@show f(x)
@show Zygote.gradient(f, x)

mcabbott · 2023-11-09T00:15:43Z

Thanks, that's helpful!

ToucheSir · 2023-11-13T23:23:46Z

Thanks for the MWE, will follow-up on the Zygote issue.

CarloLucibello · 2023-12-30T17:07:53Z

The example in the OP works fine on the latest version of the packages

bicycle1885 mentioned this issue Nov 7, 2023

Excise getindex adjoint FluxML/Zygote.jl#1328

Merged

1 task

bicycle1885 mentioned this issue Nov 8, 2023

Assignment to multiple arrays is not differentiable on GPU since Zygote.jl 0.6.67 FluxML/Zygote.jl#1470

Closed

darsnack mentioned this issue Dec 22, 2023

Add training benchmarking script FluxML/Metalhead.jl#264

Draft

CarloLucibello closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `chunk` function is not differentiable on GPU #170

The `chunk` function is not differentiable on GPU #170

bicycle1885 commented Nov 7, 2023

bicycle1885 commented Nov 7, 2023

mcabbott commented Nov 7, 2023

bicycle1885 commented Nov 7, 2023

mcabbott commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

mcabbott commented Nov 9, 2023

ToucheSir commented Nov 13, 2023

CarloLucibello commented Dec 30, 2023

The chunk function is not differentiable on GPU #170

The chunk function is not differentiable on GPU #170

Comments

bicycle1885 commented Nov 7, 2023

bicycle1885 commented Nov 7, 2023

mcabbott commented Nov 7, 2023

bicycle1885 commented Nov 7, 2023

mcabbott commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

bicycle1885 commented Nov 8, 2023

mcabbott commented Nov 9, 2023

ToucheSir commented Nov 13, 2023

CarloLucibello commented Dec 30, 2023

The `chunk` function is not differentiable on GPU #170

The `chunk` function is not differentiable on GPU #170