performance of getindex #54

bkamins · 2021-02-03T20:24:03Z

A bottleneck in fast join in DataFrames.jl is:

PooledArray(RefArray(getindex(A.refs, I)), copy(A.invpool))

as copy(A.invpool) is very expensive for large pools (also it leaves pool much larger than needed usually). I see two options:

do not copy the poll (and share it as in PooledArray we do not allow removing levels from pool)
if the new array is short relative to the old one allocate a fresh pool

This case will hit us (and probably already significantly hits us but we did not know about it in H2O benchmarks which heavily use PooledArrays)

CC @nalimilan @quinnj

The text was updated successfully, but these errors were encountered:

bkamins · 2021-02-04T10:35:30Z

So my proposal is to define getindex for PooledArray.jl for vector case as:

Base.@propagate_inbounds function Base.getindex(A::PooledArray, I::Union{Real,AbstractVector}...)
    PooledArray(RefArray(getindex(A.refs, I...)), A.invpool, A.pool)
end

without copying invpool and pool.

The reason is that we currently do not provide a way to shrink pool in PooledArray so it is not a problem to share it.

Potential problems since we are post 1.0 release:

sharing pool and invpool is not thread safe (but we were not thread safe anyway)
someone might rely on the fact that they are not shared across arrays

Additionally we could add shink_pool (or something like this) method that would recode pool, and invpool, and refs to make the representation compact (this is useful in its own right).

Note that currently we just copy invpool so in the vector returned by getindex we have a full invpool from the source even if most of it is not used.

nalimilan · 2021-02-05T17:42:20Z

So after discussing this on Slack, it seems that making getindex share the pool with the original array would be a good idea except for thread safety. Currently we're not thread safe, but at least you know that as long as you're not mutating a PooledArray from other threads, you can safely do anything on it from a given thread. If we changed getindex to share pools across arrays, mutating an array in one thread could give wrong results if another one writes to or even reads from another array.

I would say that it's too dangerous. Unfortunately it's not yet clear whether it's possible to ensure thread safety without killing performance of getindex. We could copy the pool when nthreads() > 1. That shouldn't affect user-visible behavior, but it could trigger unexpected bugs...

bkamins · 2021-02-13T19:36:41Z

Assume x is PooledArray. I have investigated into it more it seems that:

y = similar(x)
copyto!(y, x)

is as fast as x[:] (so it will of course be faster if we take a smaller slice), so maybe we could generalize this? Any opinions?

Also note that:

julia> x = string.(1:10^7);

julia> @time pa = PooledArray(x);
  6.741751 seconds (982.11 k allocations: 630.399 MiB, 5.68% gc time)

julia> @time pa = PooledArray(x);
  6.771693 seconds (100 allocations: 580.985 MiB, 12.36% gc time)

julia> @time pa[:];
  6.193193 seconds (499.18 k allocations: 606.728 MiB)

julia> @time pa[:];
  5.948993 seconds (100 allocations: 580.985 MiB)

julia> @time ca = categorical(x);
 37.319457 seconds (21.70 M allocations: 3.182 GiB, 16.48% gc time)

julia> @time ca = categorical(x);
 41.931994 seconds (20.00 M allocations: 3.099 GiB, 27.95% gc time)

julia> @time ca[:];
  2.378682 seconds (10.13 M allocations: 710.606 MiB, 84.73% gc time)

julia> @time ca[:];
  2.630116 seconds (10.00 M allocations: 703.911 MiB, 89.17% gc time)

so there is a trade-off between construction and slicing. @nalimilan - do you know the reasons?

nalimilan · 2021-02-13T20:36:13Z

Assume x is PooledArray. I have investigated into it more it seems that:
y = similar(x)
copyto!(y, x)
is as fast as x[:] (so it will of course be faster if we take a smaller slice), so maybe we could generalize this? Any opinions?

Sorry, I don't get this.

so there is a trade-off between construction and slicing. @nalimilan - do you know the reasons?

I don't think there's a trade-off, really. Maybe just a missing optimization in PooledArrays? For example, it doesn't define IndexStyle to be IndexLinear.

bkamins · 2021-02-14T07:01:26Z

Sorry, I don't get this.

I mean this:

julia> using BenchmarkTools

julia> using PooledArrays

julia> x = PooledArray(string.(1:10^6));

julia> @benchmark $x[:]
BenchmarkTools.Trial: 
  memory estimate:  62.65 MiB
  allocs estimate:  78
  --------------
  minimum time:     402.977 ms (0.00% GC)
  median time:      457.770 ms (7.83% GC)
  mean time:        462.088 ms (10.46% GC)
  maximum time:     570.657 ms (27.85% GC)
  --------------
  samples:          11
  evals/sample:     1

julia> @benchmark copy!(similar($x), $x)
BenchmarkTools.Trial: 
  memory estimate:  62.65 MiB
  allocs estimate:  77
  --------------
  minimum time:     408.912 ms (0.00% GC)
  median time:      451.813 ms (8.15% GC)
  mean time:        462.571 ms (10.51% GC)
  maximum time:     574.612 ms (27.89% GC)
  --------------
  samples:          11
  evals/sample:     1

julia> @benchmark $x[1:1]
BenchmarkTools.Trial: 
  memory estimate:  33.63 MiB
  allocs estimate:  12
  --------------
  minimum time:     15.697 ms (0.00% GC)
  median time:      17.978 ms (0.00% GC)
  mean time:        31.866 ms (37.89% GC)
  maximum time:     163.571 ms (87.31% GC)
  --------------
  samples:          157
  evals/sample:     1

julia> @benchmark copyto!(similar($x, 1), 1, $x, 1, 1)
BenchmarkTools.Trial: 
  memory estimate:  816 bytes
  allocs estimate:  9
  --------------
  minimum time:     3.300 μs (0.00% GC)
  median time:      3.530 μs (0.00% GC)
  mean time:        3.706 μs (0.73% GC)
  maximum time:     275.115 μs (97.90% GC)
  --------------
  samples:          10000
  evals/sample:     8

Indeed:

Base.IndexStyle(::Type{<:PooledArray}) = IndexLinear()

is missing, and we should add it. However, I checked that adding it does not solve this issue.

nalimilan · 2021-02-14T11:32:04Z

So you mean when the requested slice is short it would be faster to re-add levels that are actually used to an empty pool than to copy the full pool? Yeah in extreme cases like this it would help. What's hard to identify appropriate thresholds. I guess we can do as if levels are distributed randomly in slices so that the only parameter to take into account is the ratio between the length of the slice and the total number of levels?

BTW, it's a shame that we don't implement any optimized methods for copy! and copyto!.

bkamins · 2021-02-14T12:29:14Z

Yes - this is roughly what I think we should do.

bkamins · 2021-02-20T08:28:55Z

I will propose a PR for this. In the mean time I have spotted that the current code was already sharing invpool between different arrays in this case:

julia> x = PooledArray(1:3)
3-element PooledArray{Int64,UInt32,1,Array{UInt32,1}}:
 1
 2
 3

julia> y = reverse(x)
3-element PooledArray{Int64,UInt32,1,Array{UInt32,1}}:
 3
 2
 1

julia> y[1] = 4
4

julia> x
3-element PooledArray{Int64,UInt32,1,Array{UInt32,1}}:
 1
 2
 3

julia> x.invpool
Dict{Int64,UInt32} with 4 entries:
  4 => 0x00000004
  2 => 0x00000002
  3 => 0x00000003
  1 => 0x00000001

nalimilan · 2021-02-20T11:16:22Z

Wow that's really bad.

bkamins · 2021-02-20T14:58:16Z

It is only bad for threading fortunately and I guess currently it is a very rare use case.

bkamins mentioned this issue Feb 20, 2021

Add pool sharing with copy on write #56

Merged

bkamins closed this as completed in #56 Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance of getindex #54

performance of getindex #54

bkamins commented Feb 3, 2021

bkamins commented Feb 4, 2021

nalimilan commented Feb 5, 2021

bkamins commented Feb 13, 2021 •

edited

Loading

nalimilan commented Feb 13, 2021

bkamins commented Feb 14, 2021

nalimilan commented Feb 14, 2021

bkamins commented Feb 14, 2021

bkamins commented Feb 20, 2021

nalimilan commented Feb 20, 2021

bkamins commented Feb 20, 2021

performance of getindex #54

performance of getindex #54

Comments

bkamins commented Feb 3, 2021

bkamins commented Feb 4, 2021

nalimilan commented Feb 5, 2021

bkamins commented Feb 13, 2021 • edited Loading

nalimilan commented Feb 13, 2021

bkamins commented Feb 14, 2021

nalimilan commented Feb 14, 2021

bkamins commented Feb 14, 2021

bkamins commented Feb 20, 2021

nalimilan commented Feb 20, 2021

bkamins commented Feb 20, 2021

bkamins commented Feb 13, 2021 •

edited

Loading