.. becomes slow and allocates for 4 or more dimensions #20

ranocha · 2020-09-16T17:38:48Z

julia> using BenchmarkTools, EllipsisNotation

julia> u = rand(1, 2, 3);

julia> @benchmark view($u, $1, $..) # that's okay
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.889 ns (0.00% GC)
  median time:      1.907 ns (0.00% GC)
  mean time:        1.929 ns (0.00% GC)
  maximum time:     162.367 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> u = rand(1, 2, 3, 4);

julia> @benchmark view($u, $1, $..) # that's relatively slow and allocates...
BenchmarkTools.Trial: 
  memory estimate:  832 bytes
  allocs estimate:  36
  --------------
  minimum time:     1.623 μs (0.00% GC)
  median time:      1.721 μs (0.00% GC)
  mean time:        1.756 μs (1.44% GC)
  maximum time:     257.162 μs (98.41% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark view($u, $1, $:, $:, $:) # that's fast again
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.305 ns (0.00% GC)
  median time:      2.321 ns (0.00% GC)
  mean time:        2.329 ns (0.00% GC)
  maximum time:     17.121 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

A lot of time seems to be spent in

julia> @benchmark to_indices($u, $(1, ..))
BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  8
  --------------
  minimum time:     290.716 ns (0.00% GC)
  median time:      294.605 ns (0.00% GC)
  mean time:        306.962 ns (2.65% GC)
  maximum time:     8.650 μs (95.44% GC)
  --------------
  samples:          10000
  evals/sample:     271

Is there some inlining/tuple-magic to make .. fast for arrays with four or more dimensions?

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2020-09-17T15:15:07Z

Odd. Yeah, maybe something needs to make use of ntuple or something? @mbauman knows this array stuff well so I always ping him here.

sloede · 2020-09-20T04:32:56Z

Just to clarify: the slowness is not dependent on the number of array dimensions but on the number of ellipsis dimensions, i.e., it becomes slow if more than 3 dimensions are "slurped" by the .. operator:

julia> u = rand(1, 2, 3, 4, 5, 6, 7);

julia> @benchmark view($u, $1, $:, $.., $:, $:) # this is fast, as `..` only slurps 3 dimensions (3-5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.020 ns (0.00% GC)
  median time:      4.866 ns (0.00% GC)
  mean time:        4.997 ns (0.00% GC)
  maximum time:     37.419 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark view($u, $1, $:, $..,$:) # that's again slow with `..` replacing 4 dimensions (3-6)
BenchmarkTools.Trial: 
  memory estimate:  1.53 KiB
  allocs estimate:  66
  --------------
  minimum time:     3.955 μs (0.00% GC)
  median time:      5.232 μs (0.00% GC)
  mean time:        5.570 μs (2.66% GC)
  maximum time:     827.886 μs (98.18% GC)
  --------------
  samples:          10000
  evals/sample:     7

ChrisRackauckas · 2020-09-20T13:04:27Z

Yes, it's hit whenever the splat case is hit:

https://github.com/ChrisRackauckas/EllipsisNotation.jl/blob/master/src/EllipsisNotation.jl#L14

I wonder if it needs to be made fully recursive in order to make the compiler fully specialize it, or whether it needs an @generated function.

sloede · 2020-09-20T20:49:41Z

Without really knowing what's going on, I think the line you quoted is not the (only) culprit: I can construct a case where the respective fillcolons version is never hit (and here the allocations already occur even though only 3 dimensions are slurped by ..):

julia> u = rand(1, 2, 3, 4);

julia> @benchmark view($u, $1, $:, ..)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.461 ns (0.00% GC)
  median time:      3.982 ns (0.00% GC)
  mean time:        4.327 ns (0.00% GC)
  maximum time:     37.940 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark view($u, $1, ..)
BenchmarkTools.Trial: 
  memory estimate:  800 bytes
  allocs estimate:  35
  --------------
  minimum time:     2.067 μs (0.00% GC)
  median time:      2.307 μs (0.00% GC)
  mean time:        2.462 μs (0.85% GC)
  maximum time:     212.839 μs (98.22% GC)
  --------------
  samples:          10000
  evals/sample:     9

By my count, both versions only use fillcolons L10, L15 (twice in the first case, three times in the second), and finally L12. So the one you reference (L14) does not seem to play a role.
https://github.com/ChrisRackauckas/EllipsisNotation.jl/blob/43f09f9467006f21a462426bd88df707c91d0bbd/src/EllipsisNotation.jl#L10-L15
But as stated above, I am really out of my depth here.

ranocha mentioned this issue Sep 16, 2020

Evaluate and implement EllipsisNotation.jl for dimension-agnostic code trixi-framework/Trixi.jl#179

Closed

sloede mentioned this issue Sep 20, 2020

Implement EllipsisNotation where useful trixi-framework/Trixi.jl#189

Merged

ranocha added a commit to ranocha/EllipsisNotation.jl that referenced this issue Oct 8, 2020

bugfixes, updates, more tests; closes SciML#20

24e6b20

ranocha mentioned this issue Oct 8, 2020

bugfixes, updates, more tests #23

Merged

ChrisRackauckas closed this as completed in #23 Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.. becomes slow and allocates for 4 or more dimensions #20

.. becomes slow and allocates for 4 or more dimensions #20

ranocha commented Sep 16, 2020 •

edited

Loading

ChrisRackauckas commented Sep 17, 2020

sloede commented Sep 20, 2020

ChrisRackauckas commented Sep 20, 2020

sloede commented Sep 20, 2020

.. becomes slow and allocates for 4 or more dimensions #20

.. becomes slow and allocates for 4 or more dimensions #20

Comments

ranocha commented Sep 16, 2020 • edited Loading

ChrisRackauckas commented Sep 17, 2020

sloede commented Sep 20, 2020

ChrisRackauckas commented Sep 20, 2020

sloede commented Sep 20, 2020

ranocha commented Sep 16, 2020 •

edited

Loading