-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with RNN and CUDA. #2352
Comments
It looks like the issue comes from latest Zygote v0.6.67, which merged this issue: FluxML/Zygote.jl#1328 |
Changes made on our side should not be directly causing memory access errors on the CUDA one since we operate at a much higher level, so I'm inclined to say the problem lies elsewhere and was exposed by Zygote switching to the ChainRules rule. Can you get a stacktrace which includes Zygote and ChainRules(Core)? Having just CUDA.jl in there is not helpful. |
Sorry I'm afraid I missunderstand your comment regarding having CUDA.jl not being useful, as there's no issue raised when the above issue is run on the cpu. Working on GPU - Zygote v0.6.67 using Flux
using CUDA
using Flux.Losses: mse
dev = gpu # cpu is working fine
#######################
# no indexing
#######################
m = RNN(2 => 1) |> dev
x = rand(Float32, 2, 3) |> dev;
y = rand(Float32, 1, 3) |> dev
m.state
Flux.reset!(m)
m(x)
loss(m, x, y) = mse(m(x), y)
Flux.reset!(m)
gs = gradient(m) do model
loss(model, x, y)
end Failing on GPU - Zygote v0.6.67 (working on Zygote v0.6.66): m = RNN(2 => 1) |> dev
x = [rand(Float32, 2, 3) for i in 1:4] |> dev;
y = rand(Float32, 1, 3) |> dev
m.state
Flux.reset!(m)
m(x[1])
function loss_1(m, x, y)
p = [m(xi) for xi in x]
mse(p[1], y)
end
loss_1(m, x, y)
Flux.reset!(m)
gs = gradient(m) do model
loss_1(model, x, y)
end And the above return following error message: julia> gs = gradient(m) do model
loss_1(model, x, y)
end
ERROR: MethodError: no method matching parent(::Type{SubArray{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}, 0, Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, Tuple{Int64}, true}})
Closest candidates are:
parent(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at C:\Users\jerem\AppData\Local\Programs\Julia-1.8.5\share\julia\stdlib\v1.8\LinearAlgebra\src\adjtrans.jl:218
parent(::Union{LinearAlgebra.Hermitian{T, S}, LinearAlgebra.Symmetric{T, S}} where {T, S}) at C:\Users\jerem\AppData\Local\Programs\Julia-1.8.5\share\julia\stdlib\v1.8\LinearAlgebra\src\symmetric.jl:275
parent(::Union{NNlib.BatchedAdjoint{T, S}, NNlib.BatchedTranspose{T, S}} where {T, S}) at C:\Users\jerem\.julia\packages\NNlib\Fg3DQ\src\batched\batchedadjtrans.jl:73
...
Stacktrace:
[1] backend(#unused#::Type{SubArray{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}, 0, Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, Tuple{Int64}, true}})
@ GPUArraysCore C:\Users\jerem\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:151
[2] backend(x::SubArray{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}, 0, Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, Tuple{Int64}, true})
@ GPUArraysCore C:\Users\jerem\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:149
[3] _copyto!
@ C:\Users\jerem\.julia\packages\GPUArrays\5XhED\src\host\broadcast.jl:65 [inlined]
[4] materialize!
@ C:\Users\jerem\.julia\packages\GPUArrays\5XhED\src\host\broadcast.jl:41 [inlined]
[5] materialize!
@ .\broadcast.jl:868 [inlined]
[6] ∇getindex!(dx::Vector{Union{ChainRulesCore.ZeroTangent, CuMatrix{Float32, CUDA.Mem.DeviceBuffer}, DenseCuMatrix{Float32, CUDA.Mem.DeviceBuffer}}}, dy::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, inds::Int64)
@ ChainRules C:\Users\jerem\.julia\packages\ChainRules\DSuXy\src\rulesets\Base\indexing.jl:147
[7] ∇getindex(x::Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, dy::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, inds::Int64)
@ ChainRules C:\Users\jerem\.julia\packages\ChainRules\DSuXy\src\rulesets\Base\indexing.jl:89
[8] (::ChainRules.var"#1583#1585"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Int64}})()
@ ChainRules C:\Users\jerem\.julia\packages\ChainRules\DSuXy\src\rulesets\Base\indexing.jl:69
[9] unthunk
@ C:\Users\jerem\.julia\packages\ChainRulesCore\7MWx2\src\tangent_types\thunks.jl:204 [inlined]
[10] unthunk(x::ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{ChainRules.var"#1583#1585"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Int64}}}, ChainRules.var"#1582#1584"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Int64}}})
@ ChainRulesCore C:\Users\jerem\.julia\packages\ChainRulesCore\7MWx2\src\tangent_types\thunks.jl:237
[11] wrap_chainrules_output
@ C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\chainrules.jl:110 [inlined]
[12] map
@ .\tuple.jl:223 [inlined]
[13] wrap_chainrules_output
@ C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\chainrules.jl:111 [inlined]
[14] ZBack
@ C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\chainrules.jl:211 [inlined]
[15] Pullback
@ C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\tools\builtins.jl:12 [inlined]
[16] (::Zygote.Pullback{Tuple{typeof(Zygote.literal_getindex), Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Val{1}}, Tuple{Zygote.ZBack{ChainRules.var"#getindex_pullback#1581"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Int64}, Tuple{ChainRulesCore.NoTangent}}}}})(Δ::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ Zygote C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\interface2.jl:0
[17] Pullback
@ c:\Users\jerem\OneDrive\github\ADTests.jl\FluxRNNTest\rnn-gpu.jl:94 [inlined]
[18] (::Zygote.Pullback{Tuple{typeof(loss_1), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{typeof(convert), Type{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(convert), Type{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Any}}}}}, Zygote.var"#2193#back#313"{Zygote.Jnew{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Flux.Losses.var"##mse#14", typeof(Statistics.mean), typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}}}, Zygote.var"#3960#back#1287"{Zygote.var"#1283#1286"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#3752#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{Flux.Losses.var"#_check_sizes_pullback#12"}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}}}}}, Zygote.var"#collect_pullback#704"{Zygote.var"#map_back#666"{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, 1, Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Tuple{Base.OneTo{Int64}}}, Vector{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Zygote.Pullback{Tuple{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:m, Zygote.Context{false}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:cell, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, Int64}}, Zygote.Pullback{Tuple{typeof(setproperty!), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Symbol, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#typeof_pullback#45"}, Zygote.Pullback{Tuple{typeof(convert), Type{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.var"#2182#back#311"{Zygote.var"#309#310"{Symbol, Base.RefValue{Any}}}}}, Zygote.var"#back#246"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#back#245"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{Flux.var"#174#175"}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:b, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#size_pullback#919"}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{NNlib.var"#broadcasted_tanh_fast_pullback#145"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Flux.reshape_cell_output), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2047#back#232"{Zygote.var"#226#230"{2, UnitRange{Int64}}}, Zygote.var"#2155#back#293"{Zygote.var"#291#292"{Tuple{Tuple{Nothing, Nothing}, Tuple{Nothing}}, Zygote.var"#2746#back#609"{Zygote.var"#603#607"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Colon, Int64}}}}}, Zygote.ZBack{ChainRules.var"#size_pullback#917"}, Zygote.Pullback{Tuple{typeof(lastindex), Tuple{Int64, Int64}}, Tuple{Zygote.ZBack{ChainRules.var"#length_pullback#747"}}}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.ZBack{ChainRules.var"#:_pullback#276"{Tuple{Int64, Int64}}}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:σ, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, typeof(tanh)}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wi, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wh, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{Type{Pair}, Int64, Int64}, Tuple{Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Pair{Int64, Int64}, Nothing, false}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{Flux.var"#_size_check_pullback#201"{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Pair{Int64, Int64}}}}, Zygote.Pullback{Tuple{typeof(NNlib.fast_act), typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, typeof(tanh_fast)}}}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:state, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}}}}}}, Nothing}, Zygote.Pullback{Tuple{typeof(Zygote.literal_getindex), Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Val{1}}, Tuple{Zygote.ZBack{ChainRules.var"#getindex_pullback#1581"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Int64}, Tuple{ChainRulesCore.NoTangent}}}}}}})(Δ::Float32)
@ Zygote C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\interface2.jl:0
[19] Pullback
@ c:\Users\jerem\OneDrive\github\ADTests.jl\FluxRNNTest\rnn-gpu.jl:101 [inlined]
[20] (::Zygote.Pullback{Tuple{var"#115#116", Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{typeof(loss_1), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{typeof(convert), Type{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(convert), Type{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Any}}}}}, Zygote.var"#2193#back#313"{Zygote.Jnew{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Flux.Losses.var"##mse#14", typeof(Statistics.mean), typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}}}, Zygote.var"#3960#back#1287"{Zygote.var"#1283#1286"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#3752#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{Flux.Losses.var"#_check_sizes_pullback#12"}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}}}}}, Zygote.var"#collect_pullback#704"{Zygote.var"#map_back#666"{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, 1, Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Tuple{Base.OneTo{Int64}}}, Vector{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Zygote.Pullback{Tuple{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:m, Zygote.Context{false}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:cell, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, Int64}}, Zygote.Pullback{Tuple{typeof(setproperty!), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Symbol, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#typeof_pullback#45"}, Zygote.Pullback{Tuple{typeof(convert), Type{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.var"#2182#back#311"{Zygote.var"#309#310"{Symbol, Base.RefValue{Any}}}}}, Zygote.var"#back#246"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#back#245"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{Flux.var"#174#175"}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:b, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#size_pullback#919"}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{NNlib.var"#broadcasted_tanh_fast_pullback#145"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Flux.reshape_cell_output), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2047#back#232"{Zygote.var"#226#230"{2, UnitRange{Int64}}}, Zygote.var"#2155#back#293"{Zygote.var"#291#292"{Tuple{Tuple{Nothing, Nothing}, Tuple{Nothing}}, Zygote.var"#2746#back#609"{Zygote.var"#603#607"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Colon, Int64}}}}}, Zygote.ZBack{ChainRules.var"#size_pullback#917"}, Zygote.Pullback{Tuple{typeof(lastindex), Tuple{Int64, Int64}}, Tuple{Zygote.ZBack{ChainRules.var"#length_pullback#747"}}}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.ZBack{ChainRules.var"#:_pullback#276"{Tuple{Int64, Int64}}}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:σ, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, typeof(tanh)}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wi, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wh, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{Type{Pair}, Int64, Int64}, Tuple{Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Pair{Int64, Int64}, Nothing, false}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{Flux.var"#_size_check_pullback#201"{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Pair{Int64, Int64}}}}, Zygote.Pullback{Tuple{typeof(NNlib.fast_act), typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, typeof(tanh_fast)}}}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:state, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}}}}}}, Nothing}, Zygote.Pullback{Tuple{typeof(Zygote.literal_getindex), Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Val{1}}, Tuple{Zygote.ZBack{ChainRules.var"#getindex_pullback#1581"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Int64}, Tuple{ChainRulesCore.NoTangent}}}}}}}, Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}})(Δ::Float32)
@ Zygote C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\interface2.jl:0
[21] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{var"#115#116", Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{typeof(loss_1), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{Type{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Zygote.Pullback{Tuple{typeof(convert), Type{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Base.Generator{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(convert), Type{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Any}}}}}, Zygote.var"#2193#back#313"{Zygote.Jnew{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Nothing, false}}, Zygote.Pullback{Tuple{typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.Pullback{Tuple{Flux.Losses.var"##mse#14", typeof(Statistics.mean), typeof(mse), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#mean_pullback#1821"{Int64, ChainRules.var"#sum_pullback#1633"{Colon, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}}}}, Zygote.var"#3960#back#1287"{Zygote.var"#1283#1286"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#3752#back#1189"{Zygote.var"#1185#1188"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{Flux.Losses.var"#_check_sizes_pullback#12"}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}}}}}, Zygote.var"#collect_pullback#704"{Zygote.var"#map_back#666"{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, 1, Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Tuple{Tuple{Base.OneTo{Int64}}}, Vector{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Zygote.Pullback{Tuple{var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:m, Zygote.Context{false}, var"#113#114"{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:cell, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, Int64}}, Zygote.Pullback{Tuple{typeof(setproperty!), Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Symbol, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{ChainRules.var"#typeof_pullback#45"}, Zygote.Pullback{Tuple{typeof(convert), Type{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.var"#2182#back#311"{Zygote.var"#309#310"{Symbol, Base.RefValue{Any}}}}}, Zygote.var"#back#246"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 2, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.var"#back#245"{Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.Pullback{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.ZBack{Flux.var"#174#175"}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:b, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#size_pullback#919"}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.var"#3736#back#1181"{Zygote.var"#1175#1179"{Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, Zygote.ZBack{NNlib.var"#broadcasted_tanh_fast_pullback#145"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{typeof(Flux.reshape_cell_output), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#2047#back#232"{Zygote.var"#226#230"{2, UnitRange{Int64}}}, Zygote.var"#2155#back#293"{Zygote.var"#291#292"{Tuple{Tuple{Nothing, Nothing}, Tuple{Nothing}}, Zygote.var"#2746#back#609"{Zygote.var"#603#607"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Colon, Int64}}}}}, Zygote.ZBack{ChainRules.var"#size_pullback#917"}, Zygote.Pullback{Tuple{typeof(lastindex), Tuple{Int64, Int64}}, Tuple{Zygote.ZBack{ChainRules.var"#length_pullback#747"}}}, Zygote.var"#1999#back#204"{typeof(identity)}, Zygote.ZBack{ChainRules.var"#:_pullback#276"{Tuple{Int64, Int64}}}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:σ, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, typeof(tanh)}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wi, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:Wh, Zygote.Context{false}, Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.ZBack{ChainRules.var"#times_pullback#1466"{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.Pullback{Tuple{Type{Pair}, Int64, Int64}, Tuple{Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.var"#2193#back#313"{Zygote.Jnew{Pair{Int64, Int64}, Nothing, false}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}, Zygote.Pullback{Tuple{typeof(Core.convert), Type{Int64}, Int64}, Tuple{}}, Zygote.ZBack{ChainRules.var"#fieldtype_pullback#421"}}}, Zygote.Pullback{Tuple{typeof(Base.Broadcast.materialize), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{}}, Zygote.ZBack{Flux.var"#_size_check_pullback#201"{Tuple{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Pair{Int64, Int64}}}}, Zygote.Pullback{Tuple{typeof(NNlib.fast_act), typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, typeof(tanh_fast)}}}}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2015#back#213"{Zygote.var"#back#211"{2, 1, Zygote.Context{false}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}, Zygote.var"#2166#back#303"{Zygote.var"#back#302"{:state, Zygote.Context{false}, Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}}}}}}, Nothing}, Zygote.Pullback{Tuple{typeof(Zygote.literal_getindex), Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Val{1}}, Tuple{Zygote.ZBack{ChainRules.var"#getindex_pullback#1581"{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Tuple{Int64}, Tuple{ChainRulesCore.NoTangent}}}}}}}, Zygote.var"#1972#back#194"{Zygote.var"#190#193"{Zygote.Context{false}, GlobalRef, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}}})(Δ::Float32)
@ Zygote C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\interface.jl:45
[22] gradient(f::Function, args::Flux.Recur{Flux.RNNCell{typeof(tanh), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})
@ Zygote C:\Users\jerem\.julia\packages\Zygote\YYT6v\src\compiler\interface.jl:97
[23] top-level scope
@ c:\Users\jerem\OneDrive\github\ADTests.jl\FluxRNNTest\rnn-gpu.jl:100 |
I don't know why exactly this is happening, but I had to push a temporary patch for this. See https://github.com/LuxDL/Lux.jl/pull/442/files |
@jeremiedb I meant that the stacktrace @kvantitative posted only had stack frames from CUDA.jl and not Zygote or ChainRules. It wasn't at all clear how higher-level libraries would've been responsible for a low-level illegal memory access error because of that. The issue you found is FluxML/Zygote.jl#1470 (comment). If you have a better understanding of what's going on with GPU broadcast styles there or know someone who does, additional guidance would be much appreciated. |
I have attached the complete stack trace as a file. |
I commented on the issue @ToucheSir raised here FluxML/Zygote.jl#1470 (comment) but it does not bear directly on the RNN. ...
dev = m-> fmap(jl, m; exclude=Flux.Optimisers.maywrite) # move weights/inputs to "gpu"
x = tuple((rand(Float32, 2, 3) for i in 1:4)...) |> dev; # Tuple{JLArray}
... So I think the RNN is fine, the issue is the problematic use of I used a temp env with
|
the example in the OP works now |
I want to run a RNN (https://fluxml.ai/Flux.jl/stable/models/recurrence/) on the GPU, using the explicit (https://fluxml.ai/Flux.jl/stable/training/training/#Implicit-or-Explicit?) gradients.
This seems to work fine one CPU, but fails on the GPU with the message pointing to the last statement in the loss function, called from gradient:
Since I am dealing with a RNN, the history must be available for the gradient (https://en.wikipedia.org/wiki/Backpropagation_through_time). The recurrence documentation for Flux specifies that the input should be structured as a vector (over time steps) of vectors (over features).
Code below is simplified to expose the problem, the training loop is stripped away.
Versions (in clean environment):
The text was updated successfully, but these errors were encountered: