Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyto! does not vectorize for views with UnitRange indices #53430

Open
jishnub opened this issue Feb 22, 2024 · 5 comments · Fixed by #54332
Open

copyto! does not vectorize for views with UnitRange indices #53430

jishnub opened this issue Feb 22, 2024 · 5 comments · Fixed by #54332
Labels
arrays [a, r, r, a, y, s] performance Must go faster regression Regression in behavior compared to a previous version regression 1.11 Regression in the 1.11 release

Comments

@jishnub
Copy link
Contributor

jishnub commented Feb 22, 2024

julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv = view(b, axes(b)...); bv2 = view(b, UnitRange.(axes(b))...);

julia> @btime copyto!($bv, $av); # fast, indices are Base.OneTos
  2.333 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges
  24.093 μs (0 allocations: 0 bytes)

This performance difference appears to arise from a lack of vectorization in indexing. In the first case, the output of

@code_llvm Base.copyto_unaliased!(IndexCartesian(), bv, IndexCartesian(), av);

contains

vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; └└
;  @ abstractarray.jl:1105 within `copyto_unaliased!`
; ┌ @ abstractarray.jl:1335 within `getindex`
; │┌ @ abstractarray.jl:1381 within `_getindex`
; ││┌ @ subarray.jl:317 within `getindex` @ array.jl:924 @ essentials.jl:892
     %26 = add i64 %index, %20
     %27 = getelementptr inbounds double, ptr %12, i64 %26
     %wide.load = load <4 x double>, ptr %27, align 8
     %28 = getelementptr inbounds double, ptr %27, i64 4
     %wide.load1001 = load <4 x double>, ptr %28, align 8
     %29 = getelementptr inbounds double, ptr %27, i64 8
     %wide.load1002 = load <4 x double>, ptr %29, align 8
     %30 = getelementptr inbounds double, ptr %27, i64 12
     %wide.load1003 = load <4 x double>, ptr %30, align 8
; └└└
; ┌ @ abstractarray.jl:1436 within `setindex!`
; │┌ @ abstractarray.jl:1466 within `_setindex!`
; ││┌ @ subarray.jl:369 within `setindex!` @ array.jl:988
     %31 = add i64 %index, %21
     %32 = getelementptr inbounds double, ptr %13, i64 %31
     store <4 x double> %wide.load, ptr %32, align 8
     %33 = getelementptr inbounds double, ptr %32, i64 4
     store <4 x double> %wide.load1001, ptr %33, align 8
     %34 = getelementptr inbounds double, ptr %32, i64 8
     store <4 x double> %wide.load1002, ptr %34, align 8
     %35 = getelementptr inbounds double, ptr %32, i64 12
     store <4 x double> %wide.load1003, ptr %35, align 8
     %index.next = add nuw i64 %index, 16
     %36 = icmp eq i64 %index.next, %n.vec
     br i1 %36, label %middle.block, label %vector.body

whereas

@code_llvm Base.copyto_unaliased!(IndexCartesian(), bv2, IndexCartesian(), av);

contains

;  @ abstractarray.jl:1105 within `copyto_unaliased!`
; ┌ @ abstractarray.jl:1335 within `getindex`
; │┌ @ abstractarray.jl:1381 within `_getindex`
; ││┌ @ subarray.jl:317 within `getindex` @ array.jl:924
; │││┌ @ abstractarray.jl:1370 within `_to_linear_index`
; ││││┌ @ abstractarray.jl:3080 within `_sub2ind` @ abstractarray.jl:3096
; │││││┌ @ abstractarray.jl:3112 within `_sub2ind_recurse` @ abstractarray.jl:3112
; ││││││┌ @ abstractarray.jl:3119 within `offsetin`
; │││││││┌ @ int.jl:86 within `-`
          %19 = add i64 %value_phi71, -1
; ││││││└└
; ││││││┌ @ int.jl:88 within `*`
         %20 = mul i64 %"src::SubArray.parent190.size209.sroa.0.0.copyload", %19
; ││││││└
; ││││││┌ @ int.jl:87 within `+`
         %21 = add i64 %value_phi70, -1
; │││└└└└
; │││ @ subarray.jl:317 within `getindex` @ array.jl:924 @ essentials.jl:892
     %22 = add i64 %21, %20
     %23 = getelementptr inbounds double, ptr %16, i64 %22
     %24 = load double, ptr %23, align 8
; └└└
; ┌ @ abstractarray.jl:1436 within `setindex!`
; │┌ @ abstractarray.jl:1466 within `_setindex!`
; ││┌ @ subarray.jl:369 within `setindex!`
; │││┌ @ subarray.jl:295 within `reindex` @ subarray.jl:295
; ││││┌ @ views.jl:150 within `maybeview`
; │││││┌ @ array.jl:3067 within `getindex`
; ││││││┌ @ range.jl:938 within `_getindex`
; │││││││┌ @ int.jl:87 within `+`
          %25 = add i64 %value_phi71, -2
; │││└└└└└
; │││ @ subarray.jl:369 within `setindex!` @ array.jl:988
; │││┌ @ abstractarray.jl:1370 within `_to_linear_index`
; ││││┌ @ abstractarray.jl:3080 within `_sub2ind` @ abstractarray.jl:3096
; │││││┌ @ abstractarray.jl:3112 within `_sub2ind_recurse` @ abstractarray.jl:3112
; ││││││┌ @ abstractarray.jl:3119 within `offsetin`
; │││││││┌ @ int.jl:86 within `-`
          %26 = add i64 %25, %"dest::SubArray.indices_ptr[2]_ptr.start_ptr.unbox"
; ││││││└└
; ││││││┌ @ int.jl:88 within `*`
         %27 = mul i64 %"dest::SubArray.parent.size298.sroa.0.0.copyload", %26
; │││└└└└
; │││ @ subarray.jl:369 within `setindex!`
; │││┌ @ subarray.jl:295 within `reindex`
; ││││┌ @ views.jl:150 within `maybeview`
; │││││┌ @ array.jl:3067 within `getindex`
; ││││││┌ @ range.jl:938 within `_getindex`
; │││││││┌ @ int.jl:87 within `+`
          %28 = add i64 %value_phi70, -2
; │││└└└└└
; │││ @ subarray.jl:369 within `setindex!` @ array.jl:988
; │││┌ @ abstractarray.jl:1370 within `_to_linear_index`
; ││││┌ @ abstractarray.jl:3080 within `_sub2ind` @ abstractarray.jl:3096
; │││││┌ @ abstractarray.jl:3112 within `_sub2ind_recurse` @ abstractarray.jl:3112
; ││││││┌ @ int.jl:87 within `+`
         %29 = add i64 %28, %"dest::SubArray.indices_ptr[1]_ptr.start_ptr.unbox"
; │││└└└└
     %30 = add i64 %29, %27
     %31 = getelementptr inbounds double, ptr %17, i64 %30
     store double %24, ptr %31, align 8
; └└└

In particular, if I add a @simd declaration, this appears to improve performance considerably:

julia> function mycopyto!(dest::AbstractArray, src::AbstractArray)
           iterdest = eachindex(dest)
           itersrc = eachindex(src)
           if iterdest == itersrc
               @inbounds @simd for I in iterdest
                   if isassigned(src, I)
                       dest[I] = src[I]
                   else
                       Base._unsetindex!(dest, I)
                   end
               end
           else
               @invoke copyto!(dest::AbstractArray, src::AbstractArray)
           end
           return dest
       end
mycopyto! (generic function with 1 method)

julia> @btime mycopyto!($bv2, $av);
  2.276 μs (0 allocations: 0 bytes)

Versioninfo:

julia> versioninfo()
Julia Version 1.12.0-DEV.50
Commit 8425b0ea03* (2024-02-21 20:33 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl
@jishnub jishnub added performance Must go faster arrays [a, r, r, a, y, s] labels Feb 22, 2024
@jishnub
Copy link
Contributor Author

jishnub commented Feb 22, 2024

This also appears to be a regression from v1.10.1, where both the operations are equally performant:

julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv = view(b, axes(b)...); bv2 = view(b, UnitRange.(axes(b))...);

julia> @btime copyto!($bv, $av);
  9.646 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av);
  9.560 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl

@jishnub jishnub added the regression Regression in behavior compared to a previous version label Feb 22, 2024
@jishnub
Copy link
Contributor Author

jishnub commented Feb 25, 2024

Bisected the regression to #51760

commit f0a28e9a45a34f1c524b3cf02cbbac1351f1da81 (HEAD)
Author: Jameson Nash <jameson@juliacomputing.com>
Date:   Tue Oct 24 16:07:08 2023 -0500

    add unsetindex support to more copyto methods (#51760)

On df39cee,

julia> a = rand(100,100); b = similar(a); av = view(a, axes(a)...); bv2 = view(b, UnitRange.(axes(b))...);

julia> @btime copyto!($bv2, $av);
  2.428 μs (0 allocations: 0 bytes)

whereas on f0a28e9:

julia> @btime copyto!($bv2, $av);
  20.690 μs (0 allocations: 0 bytes)

@jishnub jishnub changed the title setindex! does not vectorize for views with UnitRange indices copyto! does not vectorize for views with UnitRange indices Feb 26, 2024
@giordano giordano added this to the 1.11 milestone Mar 5, 2024
KristofferC added a commit that referenced this issue May 13, 2024
This reverts commit f0a28e9.

This introduced in general a try catch inside the inner loop for
`copyto!` and it also has performance regression in other cases
#53430.

Since this was added without any tests and "is not-quite-public API" it
seems easiest to just revert it.
This was added for Memory-to-Array and vice versa but dedicated methods
could be added for that if it is desirable

Fixes #53430,
#52070
@KristofferC
Copy link
Member

KristofferC commented May 14, 2024

Even with the revert of #51760 there still seem to be some slowdown:

1.10:

julia> @btime copyto!($bv, $av); # fast, indices are Base.OneTos
  3.695 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges
  3.522 μs (0 allocations: 0 bytes)

1.11:

julia> @btime copyto!($bv, $av); # fast, indices are Base.OneTos
  1.158 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges
  7.991 μs (0 allocations: 0 bytes)

But I am not sure it is bad enough to require a milestone...

@KristofferC KristofferC reopened this May 14, 2024
@KristofferC KristofferC removed this from the 1.11 milestone May 14, 2024
@jishnub
Copy link
Contributor Author

jishnub commented May 20, 2024

Oddly, for me, the performance regression persists on a recently nightly:

julia> @btime copyto!($bv, $av); # fast, indices are Base.OneTos
  2.375 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av); # slow, indices are UnitRanges
  23.745 μs (0 allocations: 0 bytes)

julia> VERSION
v"1.12.0-DEV.560"

I also see a similar issue on the backports-release-1.11 branch. I'm unsure why the issue still persists, but perhaps we should add the milestone back.

julia> @btime copyto!($bv, $av);
  2.597 μs (0 allocations: 0 bytes)

julia> @btime copyto!($bv2, $av);
  24.499 μs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.11.0-beta2.2
Commit 862f863e0f* (2024-05-29 10:49 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl

@KristofferC Did the performance improve for you after reverting the PR?

Edit: I guess I had missed the previous comment.

@jishnub jishnub added this to the 1.11 milestone May 30, 2024
@KristofferC KristofferC added the regression 1.11 Regression in the 1.11 release label Jun 3, 2024
@KristofferC KristofferC removed this from the 1.11 milestone Jun 3, 2024
@jishnub
Copy link
Contributor Author

jishnub commented Sep 29, 2024

Profiling just the relevant branch shows

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line  Function
=========================================================
    ╎4762 @Base/client.jl:568  _start()
    ╎ 4762 @Base/client.jl:593  repl_main
    ╎  4762 @Base/client.jl:511  run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool)
    ╎   4762 @Base/essentials.jl:1045  invokelatest
    ╎    4762 @Base/essentials.jl:1048  #invokelatest#14762 @Base/client.jl:490  run_std_repl(REPL::Module, quiet::Bool, banner::Symbol, history_file::Bool)
    ╎    ╎ 4762 @REPL/src/REPL.jl:606  run_repl(repl::REPL.AbstractREPL, consumer::Any)
    ╎    ╎  4762 @REPL/src/REPL.jl:620  run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
    ╎    ╎   4762 @REPL/src/REPL.jl:454  start_repl_backend
    ╎    ╎    4762 @REPL/src/REPL.jl:457  start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    ╎    ╎     4762 @REPL/src/REPL.jl:472  repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    ╎    ╎    ╎ 4762 @REPL/src/REPL.jl:360  eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
    ╎    ╎    ╎  4762 @REPL/src/REPL.jl:335  toplevel_eval_with_hooks
    ╎    ╎    ╎   4762 @REPL/src/REPL.jl:342  toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
    ╎    ╎    ╎    4762 @REPL/src/REPL.jl:342  toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
    ╎    ╎    ╎     4762 @REPL/src/REPL.jl:342  toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
    ╎    ╎    ╎    ╎ 4762 @REPL/src/REPL.jl:338  toplevel_eval_with_hooks(mod::Module, ast::Any, toplevel_file::Any, toplevel_line::Any)
    ╎    ╎    ╎    ╎  4762 @REPL/src/REPL.jl:331  __repl_entry_eval_expanded_with_loc(mod::Module, ast::Any, toplevel_file::Ref{Ptr{UInt8}}, toplevel_line::Ref{Int32})
    ╎    ╎    ╎    ╎   4762 @BenchmarkTools/src/execution.jl:136  kwcall(::@NamedTuple{warmup::Bool}, ::typeof(run), b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    4762 @BenchmarkTools/src/execution.jl:144  run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwar
    ╎    ╎    ╎    ╎     4762 @BenchmarkTools/src/execution.jl:47  run_result
    ╎    ╎    ╎    ╎    ╎ 4762 @BenchmarkTools/src/execution.jl:48  #run_result#31
    ╎    ╎    ╎    ╎    ╎  4762 @Base/essentials.jl:1045  invokelatest
    ╎    ╎    ╎    ╎    ╎   4762 @Base/essentials.jl:1050  #invokelatest#1
    ╎    ╎    ╎    ╎    ╎    4762 @BenchmarkTools/src/execution.jl:109  kwcall(::@NamedTuple{warmup::Bool}, ::typeof(BenchmarkTools._run), b::BenchmarkTools.Benchmark, p::BenchmarkTools.Par
    ╎    ╎    ╎    ╎    ╎     26   @BenchmarkTools/src/execution.jl:119  _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, warmup::Bool, kwargs::Ba
    ╎    ╎    ╎    ╎    ╎    ╎ 26   @BenchmarkTools/src/execution.jl:570  var"##sample#280"(::Tuple{SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, 
    ╎    ╎    ╎    ╎    ╎    ╎  26   @BenchmarkTools/src/execution.jl:561  var"##core#279"(bv2#277::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
   2╎    ╎    ╎    ╎    ╎    ╎   2    REPL[1]:0  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎   15   REPL[1]:7  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎    5    @Base/abstractarray.jl:1341  getindex
    ╎    ╎    ╎    ╎    ╎    ╎     5    @Base/abstractarray.jl:1387  _getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 5    @Base/subarray.jl:316  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  5    @Base/array.jl:928  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   4    @Base/abstractarray.jl:1376  _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    4    @Base/abstractarray.jl:3077  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     4    @Base/abstractarray.jl:3093  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 4    @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  4    @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/abstractarray.jl:3116  offsetin
   2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/int.jl:86  -
   2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/int.jl:87  +
   1╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/essentials.jl:910  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    10   @Base/abstractarray.jl:1442  setindex!
    ╎    ╎    ╎    ╎    ╎    ╎     10   @Base/abstractarray.jl:1472  _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 10   @Base/subarray.jl:386  setindex!
   4╎    ╎    ╎    ╎    ╎    ╎    ╎  7    @Base/array.jl:992  setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   3    @Base/abstractarray.jl:1376  _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/abstractarray.jl:3077  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/abstractarray.jl:3093  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 3    @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  3    @Base/abstractarray.jl:3109  _sub2ind_recurse
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/int.jl:88  *
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/int.jl:87  +
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  3    @Base/subarray.jl:294  reindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   3    @Base/views.jl:150  maybeview
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/array.jl:3082  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/range.jl:945  _getindex
   2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/int.jl:87  +
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/int.jl:86  -
   3╎    ╎    ╎    ╎    ╎    ╎   9    REPL[1]:8  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎    6    @Base/multidimensional.jl:423  iterate
   4╎    ╎    ╎    ╎    ╎    ╎     6    @Base/multidimensional.jl:448  __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/operators.jl:321  !=
   2╎    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/promotion.jl:639  ==
   2╎    ╎    ╎    ╎    ╎     4736 @BenchmarkTools/src/execution.jl:125  _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, warmup::Bool, kwargs::Ba
    ╎    ╎    ╎    ╎    ╎    ╎ 4734 @BenchmarkTools/src/execution.jl:570  var"##sample#280"(::Tuple{SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, 
    ╎    ╎    ╎    ╎    ╎    ╎  4734 @BenchmarkTools/src/execution.jl:561  var"##core#279"(bv2#277::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, …
  96╎    ╎    ╎    ╎    ╎    ╎   96   REPL[1]:0  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎   2730 REPL[1]:7  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎    622  @Base/abstractarray.jl:1341  getindex
    ╎    ╎    ╎    ╎    ╎    ╎     622  @Base/abstractarray.jl:1387  _getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 622  @Base/subarray.jl:316  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  622  @Base/array.jl:928  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   418  @Base/abstractarray.jl:1376  _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    418  @Base/abstractarray.jl:3077  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     418  @Base/abstractarray.jl:3093  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 418  @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  418  @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   155  @Base/abstractarray.jl:3116  offsetin
 155╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    155  @Base/int.jl:86  -
 110╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   110  @Base/int.jl:88  *
 153╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   153  @Base/int.jl:87  +
 204╎    ╎    ╎    ╎    ╎    ╎    ╎   204  @Base/essentials.jl:910  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    2108 @Base/abstractarray.jl:1442  setindex!
    ╎    ╎    ╎    ╎    ╎    ╎     2108 @Base/abstractarray.jl:1472  _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2108 @Base/subarray.jl:386  setindex!
 914╎    ╎    ╎    ╎    ╎    ╎    ╎  1859 @Base/array.jl:992  setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   945  @Base/abstractarray.jl:1376  _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    945  @Base/abstractarray.jl:3077  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     945  @Base/abstractarray.jl:3093  _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 945  @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  945  @Base/abstractarray.jl:3109  _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   469  @Base/abstractarray.jl:3116  offsetin
 469╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    469  @Base/int.jl:86  -
 204╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   204  @Base/int.jl:88  *
 272╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   272  @Base/int.jl:87  +
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  249  @Base/subarray.jl:294  reindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   249  @Base/views.jl:150  maybeview
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    249  @Base/array.jl:3082  getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     249  @Base/range.jl:945  _getindex
 150╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 150  @Base/int.jl:87  +
  99╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 99   @Base/int.jl:86  -
 290╎    ╎    ╎    ╎    ╎    ╎   1908 REPL[1]:8  copyto_current!(dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, src::SubArray{Float64, 2, Mat
    ╎    ╎    ╎    ╎    ╎    ╎    1618 @Base/multidimensional.jl:423  iterate
    ╎    ╎    ╎    ╎    ╎    ╎     95   @Base/multidimensional.jl:447  __inc
  95╎    ╎    ╎    ╎    ╎    ╎    ╎ 95   @Base/int.jl:87  +
 499╎    ╎    ╎    ╎    ╎    ╎     1523 @Base/multidimensional.jl:448  __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1024 @Base/operators.jl:321  !=
1024╎    ╎    ╎    ╎    ╎    ╎    ╎  1024 @Base/promotion.jl:639  ==
Total snapshots: 4762. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

The == comparison between Ints (@Base/promotion.jl:639) takes a long time, and this particular comparison doesn't show up in the profile in v1.10. This seems similar to the profile seen in #53158, and I wonder if this may be traced back to 9aa7980 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster regression Regression in behavior compared to a previous version regression 1.11 Regression in the 1.11 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants