Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions in broadcasting of "array wrapper" on 0.7 #26928

Open
marius311 opened this issue Apr 28, 2018 · 1 comment
Open

Performance regressions in broadcasting of "array wrapper" on 0.7 #26928

marius311 opened this issue Apr 28, 2018 · 1 comment
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@marius311
Copy link
Contributor

I have a very basic "array wrapper" type (reduced from something more realistic),

struct ArrayWrapper{T,N} <: AbstractArray{T,N}
    dat::Array{T,N}
end
Base.size(A::ArrayWrapper) = size(A.dat)
Base.getindex(A::ArrayWrapper, ix...) = getindex(A.dat,ix...)

On 0.6.2, it is exactly as fast as Array c.f. broadcasting,

f = ArrayWrapper(rand(512,512))
g = rand(512,512)
julia> @benchmark $f + $f
BenchmarkTools.Trial: 
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     241.272 μs (0.00% GC)
  median time:      269.241 μs (0.00% GC)
  mean time:        302.636 μs (5.95% GC)
  maximum time:     2.782 ms (35.97% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark $g + $g
BenchmarkTools.Trial: 
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     243.411 μs (0.00% GC)
  median time:      267.683 μs (0.00% GC)
  mean time:        300.503 μs (6.51% GC)
  maximum time:     2.196 ms (30.22% GC)
  --------------
  samples:          10000
  evals/sample:     1

whereas with current master (509d6a1) I find,

julia> @benchmark $f + $f
BenchmarkTools.Trial: 
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     383.274 μs (0.00% GC)
  median time:      427.439 μs (0.00% GC)
  mean time:        466.831 μs (4.51% GC)
  maximum time:     35.951 ms (98.51% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark $g + $g
BenchmarkTools.Trial: 
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     256.351 μs (0.00% GC)
  median time:      285.444 μs (0.00% GC)
  mean time:        398.107 μs (9.17% GC)
  maximum time:     37.207 ms (97.67% GC)
  --------------
  samples:          10000
  evals/sample:     1

I know #26891 was just merged and mentions some known performance regressions, although I'm not sure if this is those. I see some or most of this slowdown even before that commit.

In case it helps:

julia> versioninfo()
Julia Version 0.7.0-DEV.4959
Commit 509d6a1a88* (2018-04-27 21:12 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-5.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 1
@mbauman
Copy link
Member

mbauman commented Apr 29, 2018

My guess is that previously LLVM had been able to hoist these bounds checks out of the loop. That's no longer the case. The quick fix here for you to restore the performance is by writing your own @boundscheck blocks or by using Base.@propagate_inbounds (the former is a bit better practice since you'll get better bounds error messages that are more relevant to the user).

julia> @benchmark $f + $f
BenchmarkTools.Trial:
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     454.936 μs (0.00% GC)
  median time:      461.461 μs (0.00% GC)
  mean time:        578.215 μs (6.28% GC)
  maximum time:     44.109 ms (96.60% GC)
  --------------
  samples:          8623
  evals/sample:     1

julia> @benchmark $g + $g
BenchmarkTools.Trial:
  memory estimate:  2.00 MiB
  allocs estimate:  3
  --------------
  minimum time:     253.383 μs (0.00% GC)
  median time:      276.536 μs (0.00% GC)
  mean time:        489.600 μs (11.22% GC)
  maximum time:     43.423 ms (97.26% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> Base.@propagate_inbounds Base.getindex(A::ArrayWrapper, ix...) = getindex(A.dat,ix...)

julia> @benchmark $f + $f
BenchmarkTools.Trial:
  memory estimate:  2.00 MiB
  allocs estimate:  2
  --------------
  minimum time:     250.917 μs (0.00% GC)
  median time:      278.478 μs (0.00% GC)
  mean time:        492.156 μs (11.41% GC)
  maximum time:     43.017 ms (97.31% GC)
  --------------
  samples:          10000
  evals/sample:     1

@mbauman mbauman added broadcast Applying a function over a collection performance Must go faster labels Apr 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants