Improve type stability of cached walks #82

chengchingwen · 2024-05-09T01:29:25Z

This PR adds a special cache type that allows the compiler to use the signature of the un-cached walk to generate corresponding type assertion to the untyped cache (IdDict{Any, Any}). This would improve the type stability of fmapand friends. It also looses the constraint of the cache type so functionality outside fmap remains the same.

CarloLucibello · 2024-05-13T05:05:40Z

This adds some complexity to the code and some fragility as well, since it seems it could break with newer julia versions.
Can you post some benchmarks showing performance improvements?

chengchingwen · 2024-05-13T05:22:32Z

Not a benchmark, but without this PR:

julia> @code_warntype gpu(Chain(Dense(3, 5), Dense(5, 2)))
MethodInstance for Flux.gpu(::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
  from gpu(x) @ Flux ~/.julia/packages/Flux/Wz6D4/src/functor.jl:248
Arguments
  #self#::Core.Const(Flux.gpu)
  x::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}
Body::Chain{T} where T<:Tuple{Any, Any}
1 ─ %1 = Flux.FluxCUDAAdaptor()::Core.Const(Flux.FluxCUDAAdaptor(nothing))
│   %2 = Flux.gpu(%1, x)::Chain{T} where T<:Tuple{Any, Any}
└──      return %2

v.s. with:

julia> @code_warntype gpu(Chain(Dense(3, 5), Dense(5, 2)))
MethodInstance for Flux.gpu(::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity
), Matrix{Float32}, Vector{Float32}}}})
  from gpu(x) @ Flux ~/.julia/packages/Flux/Wz6D4/src/functor.jl:248
Arguments
  #self#::Core.Const(Flux.gpu)
  x::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vecto
r{Float32}}}}
Body::Union{Chain{Tuple{Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.D
eviceBuffer}}, Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuff
er}}}}, Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}}
1 ─ %1 = Flux.FluxCUDAAdaptor()::Core.Const(Flux.FluxCUDAAdaptor(nothing))
│   %2 = Flux.gpu(%1, x)::Union{Chain{Tuple{Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Fl
oat32, 1, CUDA.Mem.DeviceBuffer}}, Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1,
 CUDA.Mem.DeviceBuffer}}}}, Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}}
└──      return %2

CarloLucibello · 2024-05-15T06:25:33Z

@darsnack @ToucheSir what do you think? I'm unfamiliar with expression manipulations.

darsnack · 2024-05-15T11:40:24Z

I am also concerned about fragility. The implementation itself is sensible, but as written seems like it will need to get updated for internal changes often. The core idea is to use the return type of the walk to force the type when accessing the cache, right? That seems like a very straight-forward generated function to write with the call to return_type being the only brittle bit. Or is the rest of the current implementation necessary for performance reasons? Accessing the IdDict is the main reason Functors is type unstable, so fixing it is nice.

Pulling back, is there a use-case where we lack a function barrier between the call to gpu and the hot code path?

chengchingwen · 2024-05-15T11:59:20Z

The core idea is to use the return type of the walk to force the type when accessing the cache, right? That seems like a very straight-forward generated function to write with the call to return_type being the only brittle bit. Or is the rest of the current implementation necessary for performance reasons?

Yes, essentially the whole generated function is just to generate return cache.cache[x]::(return_type(cache.walk, typeof(args))). It also seems to be doable without generated function, but with the generated function we can get the precise world-age (though I'm not familiar enough with the world-age mechanism to know if the precise world-age is required in this use-case).

Pulling back, is there a use-case where we lack a function barrier between the call to gpu and the hot code path?

if you need to handle data movement during the forward/backward pass.

CarloLucibello · 2024-11-04T08:20:01Z

given the concerns expressed in LuxDL/Lux.jl#1017 I think we should do this.

chengchingwen · 2024-11-04T08:27:09Z

@CarloLucibello Since Julia v1.10 is the new LTS, do you think we could drop v1.6 support so that we can remove that @static if VERSION >= v"1.10.0-DEV.609" branch which makes the code look fragile?

CarloLucibello · 2024-11-04T09:15:36Z

yes, we should do that.

chengchingwen added 3 commits May 9, 2024 09:18

improve type stability of cached walks

e672c6e

fix doctest format

8b91724

handle old julia version

992b228

chengchingwen requested a review from CarloLucibello May 9, 2024 02:36

CarloLucibello closed this Nov 4, 2024

CarloLucibello reopened this Nov 4, 2024

CarloLucibello approved these changes Nov 4, 2024

View reviewed changes

CarloLucibello merged commit 2945731 into FluxML:master Nov 4, 2024
11 of 12 checks passed

This was referenced Nov 4, 2024

prepare for v0.5 release #91

Merged

sending to devices tuples, named tuples and arrays does not keep track of identical objects LuxDL/Lux.jl#1017

Closed

mcabbott mentioned this pull request Nov 4, 2024

Version 0.5 gives errors on strings, on Julia 1.12 #92

Closed

chengchingwen deleted the stablecache branch November 5, 2024 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type stability of cached walks #82

Improve type stability of cached walks #82

chengchingwen commented May 9, 2024

CarloLucibello commented May 13, 2024

chengchingwen commented May 13, 2024

CarloLucibello commented May 15, 2024

darsnack commented May 15, 2024

chengchingwen commented May 15, 2024

CarloLucibello commented Nov 4, 2024

chengchingwen commented Nov 4, 2024

CarloLucibello commented Nov 4, 2024

Improve type stability of cached walks #82

Improve type stability of cached walks #82

Conversation

chengchingwen commented May 9, 2024

CarloLucibello commented May 13, 2024

chengchingwen commented May 13, 2024

CarloLucibello commented May 15, 2024

darsnack commented May 15, 2024

chengchingwen commented May 15, 2024

CarloLucibello commented Nov 4, 2024

chengchingwen commented Nov 4, 2024

CarloLucibello commented Nov 4, 2024