You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously we were using a CuArray type that could represent a view, reshape, reinterpret, etc. For the sake of simplicity, I switched to a simpler CuArray type while reusing Base.SubArray, Base.ReshapeArray, etc. That requires use of type unions to, e.g., represent all dense or strided CuArrays:
In a similar vein, Adapt.jl defines a union that captures all array instances that can be used on the GPU (i.e. not necessarily dense or strided, but an Adjoint or PermuteDimsArray): https://github.com/JuliaGPU/Adapt.jl/blob/11d96a531cb70359e88ed2ad0d0a13a85727a204/src/wrappers.jl#L73-L92
Using these unions makes load time go crazy, e.g. with mul!(::CuArray, ::AnyCuArray...) (where AnyCuArray uses the Adapt.WrappedArray union) it goes from 5 to 25s.
I can understand how the large union from Adapt.jl is needlessly taxing on inference, and I guess we may need something like an AbstractWrappedArray here (JuliaLang/julia#31563). However, with StridedCuArray I had not expected these regressions, as Base uses similar patterns. Am I doing anything especially bad here? I'd like to start using StridedCuArray much more, in order to cover APIs that take stride inputs (which there are quite some).
The text was updated successfully, but these errors were encountered:
Previously we were using a
CuArray
type that could represent a view, reshape, reinterpret, etc. For the sake of simplicity, I switched to a simplerCuArray
type while reusingBase.SubArray
,Base.ReshapeArray
, etc. That requires use of type unions to, e.g., represent all dense or stridedCuArray
s:CUDA.jl/src/array.jl
Lines 146 to 164 in 75f7d30
These definitions are almost identical to how Base defines
StridedArray
. However, using them significantly regresses load time. For example, #450 adds them to a bunch ofLinearAlgebra.mul!
methods which badly affects time ofusing CUDA
: +25%, https://speed.juliagpu.org/timeline/#/?exe=4&ben=latency/import&env=1&revs=50&base=3+96&equid=off&quarts=on&extr=onIn a similar vein, Adapt.jl defines a union that captures all array instances that can be used on the GPU (i.e. not necessarily dense or strided, but an
Adjoint
orPermuteDimsArray
): https://github.com/JuliaGPU/Adapt.jl/blob/11d96a531cb70359e88ed2ad0d0a13a85727a204/src/wrappers.jl#L73-L92Using these unions makes load time go crazy, e.g. with
mul!(::CuArray, ::AnyCuArray...)
(whereAnyCuArray
uses theAdapt.WrappedArray
union) it goes from 5 to 25s.I can understand how the large union from Adapt.jl is needlessly taxing on inference, and I guess we may need something like an
AbstractWrappedArray
here (JuliaLang/julia#31563). However, withStridedCuArray
I had not expected these regressions, as Base uses similar patterns. Am I doing anything especially bad here? I'd like to start usingStridedCuArray
much more, in order to cover APIs that takestride
inputs (which there are quite some).The text was updated successfully, but these errors were encountered: