Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimised methods for reducing(hcat/vcat on any iterators of vectors #31644

Closed
wants to merge 7 commits into from

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Apr 7, 2019

This covers the 2 vector cases of #31636
which I think are the most important.
Basically making all iterators comparably performant for this as Arrays

Looking at it, I do wonder if they (including the existing ones from #21672)
should be pushed down to be specialisations of mapreduce(identity, vcat/hcat,.
Not sure though.

Benchmarks:

Data

Covering each combination of iterator traits

v_v = [rand(128) for ii in 1:1000]
g_v = (x for x in v_v)
f_g_v = Iterators.filter(x->true, g_v)
f_v_v = Iterators.filter(x->true, v_v);

Results:

Vector
--- hcat, on: Array{Array{Float64,1},1} ---
splatting: 
  148.638 μs (8 allocations: 1023.94 KiB)
reduce old: 
  60.284 μs (2 allocations: 1000.08 KiB)
reduce new: 
  57.536 μs (2 allocations: 1000.08 KiB)

--- hcat, on: Base.Generator{Array{Array{Float64,1},1},getfield(Main, Symbol("##159#160"))} ---
splatting: 
  188.010 μs (1505 allocations: 1.05 MiB)
reduce old: 
  259.465 ms (2985 allocations: 488.88 MiB)
reduce new: 
  58.025 μs (2 allocations: 1000.08 KiB)

--- hcat, on: Base.Iterators.Filter{getfield(Main, Symbol("##163#164")),Array{Array{Float64,1},1}} ---
splatting: 
  190.236 μs (1505 allocations: 1.05 MiB)
reduce old: 
  282.979 ms (2985 allocations: 488.88 MiB)
reduce new: 
  187.215 μs (13 allocations: 2.00 MiB)

--- hcat, on: Base.Iterators.Filter{getfield(Main, Symbol("##161#162")),Base.Generator{Array{Array{Float64,1},1},getfield(Main, Symbol("##159#160"))}} ---
splatting: 
  191.919 μs (1505 allocations: 1.05 MiB)
reduce old: 
  266.831 ms (2985 allocations: 488.88 MiB)
reduce new: 
  228.940 μs (13 allocations: 2.00 MiB)

=================
--- vcat, on: Array{Array{Float64,1},1} ---
splatting: 
  61.553 μs (3 allocations: 1008.02 KiB)
reduce old: 
  57.962 μs (2 allocations: 1000.08 KiB)
reduce new: 
  55.552 μs (2 allocations: 1001.20 KiB)

--- vcat, on: Base.Generator{Array{Array{Float64,1},1},getfield(Main, Symbol("##159#160"))} ---
splatting: 
  103.196 μs (1500 allocations: 1.03 MiB)
reduce old: 
  282.454 ms (1984 allocations: 488.85 MiB)
reduce new: 
  55.424 μs (2 allocations: 1001.20 KiB)

--- vcat, on: Base.Iterators.Filter{getfield(Main, Symbol("##163#164")),Array{Array{Float64,1},1}} ---
splatting: 
  105.145 μs (1500 allocations: 1.03 MiB)
reduce old: 
  281.383 ms (1984 allocations: 488.85 MiB)
reduce new: 
  157.521 μs (11 allocations: 2.00 MiB)

--- vcat, on: Base.Iterators.Filter{getfield(Main, Symbol("##161#162")),Base.Generator{Array{Array{Float64,1},1},getfield(Main, Symbol("##159#160"))}} ---
splatting: 
  107.264 μs (1500 allocations: 1.03 MiB)
reduce old: 
  281.396 ms (1984 allocations: 488.85 MiB)
reduce new: 
  177.695 μs (11 allocations: 2.00 MiB)

Note the reduce new: entry bypasses the existing method for reduce(vcat|hcat, ::Array{<:AbstractVector}),
so that I could compare performance on just treating Arrays, as Iterators,
and thus to see if we could drop the extra specialised methods for them.
(once another PR to do matrixes is complete)

Benchmark Takeways:

  • On iterators with known length, performance is equivelent to that we get on Arrays
  • Up to a 4000x speedup (vcat on generators)

base/reduce.jl Outdated
if !(isize isa SizeUnknown)
# Assume first element has representitive size, unless that would make this too large
SIZEHINT_CAP = 10^6
sizehint!(ret, min(SIZEHINT_CAP, length(xs)*length(x1)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substatial point in speeding this up.
It is the reason knowing the size gives a speedup for vcat.

And it think it is a very common case that
the size of your vectors are all the same,
or when it is wrong that it is still a good

I am not sure what a good value for SIZEHINT_CAP is.
We need it to catch cases that only fit in memory because
the first element is massive, and the rest much smaller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it makes a large difference? IIRC append! is supposed to double the size of the storage to ensure resizing doesn't happen too often. At any rate, I don't think it's correct to assume the first element is representative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially on small cases it really matters, adding it got a 60% speedup.
one my testcast of 100 vectors of length 128.

Doubling just doesn't increase the capacity that much, if you are doubling a realitvely small number.
Consider if you have 100 vectors of equal length.
Then doubling results in, having to allocate 7 times.
Which is a lot, as a portion of the time you need to spend

So the case of thinking the first element is representative is
is a bit of a guess.

  • If the guess is approximately right then you'll probably only end up doing 1 more allocation.
  • If the guess is too low, then your basically back into the case of doubling, so you've lost nothing by taking it.
  • If the guess is too large, this is the dangerous point, because it risks allocating a ton of memory that isn't needed.

The last case is where the SIZEHINT_CAP comes in. It is our guard against that case. So we set it to some suitable number, I thought 10^6 might be OK, so that would be allocating 4MB if it was Float64s. But we could do 10^5 if we anted to be more conservative.
The other thing is that once things are bigger than SIZEHINT_CAP,
then the array size should be large enough that doubling is a very effective increase.

Copy link
Member

@KristofferC KristofferC Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think allocating this much extra is reasonable so you need to sizehint! back to the actual number in the end (which will reduce the capacity).

Also, posting the benchmarks is a good idea (unless they are the same as in the first post).

This feels like too much heuristic in a PR which otherwise would be quite simple.

Copy link
Contributor Author

@oxinabox oxinabox Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark:
With code that bypasses normal array reduce:

  • data = [rand(max(ceil(Int,96+32(randn())), 0)) for ii in 1:1000]
  • Timing with @btime, different computer from before. (probably a worse computer for benchmarking as it is a shared server, but still point holds)
  • Multiple rounds of testing to roughly catch on chance since we are using random length arrays
  • round 1:
    • with hint: 498.927 μs (2 allocations: 782.19 KiB),
    • with out : 728.514 μs (11 allocations: 1.80 MiB)
  • round2:
    • with hint: 757.129 μs (4 allocations: 2.08 MiB)
    • with out : 900.870 μs (13 allocations: 2.56 MiB)
  • round3:
    • with hint: 643.748 μs (3 allocations: 1.74 MiB)
    • with out : 687.555 μs (11 allocations: 1.51 MiB)

Now of-course the advantage goes up if your inital guess at the size was wrong.
And once sizehint!ing back down is added, that will also cut down the advantage in doing it.
But I think it will still be worth it
(can further add heuristics about not sizehinting down if <2x too large, since that is acceptable)

offset = length(x1)+1
while(x_state !== nothing)
x, state = x_state
length(x)==dim1_size || throw(DimensionMismatch("hcat"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include dimensions in error message

@oxinabox
Copy link
Contributor Author

oxinabox commented Apr 7, 2019

Errors given by the CI are correctt, looks like i broke at least one case for matrixes

base/abstractarray.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Contributor Author

oxinabox commented Apr 8, 2019

Ok, now the failing tests are not real. Some kind of distributed processing fault

base/reduce.jl Outdated
if !(isize isa SizeUnknown)
# Assume first element has representitive size, unless that would make this too large
SIZEHINT_CAP = 10^6
sizehint!(ret, min(SIZEHINT_CAP, length(xs)*length(x1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it makes a large difference? IIRC append! is supposed to double the size of the storage to ensure resizing doesn't happen too often. At any rate, I don't think it's correct to assume the first element is representative.

reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...)
function reduce(op, itr::T; kw...) where T
# Redispatch, adding traits
reduce(op, itr, eltype_or_default_eltype(itr), IteratorSize(T); kw...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private _reduce method.

Copy link
Contributor Author

@oxinabox oxinabox Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unless it should be a private _mapreduce method.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess _mapreduce is better if you can support that, since that will also support mapreduce(identity, ...).

@@ -362,10 +367,95 @@ julia> reduce(*, [2; 3; 4]; init=-1)
-24
```
"""
reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...)
function reduce(op, itr::T; kw...) where T
# Redispatch, adding traits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this comment, it is literally what the code below it does.

end

function reduce(op, itr, et, isize; kw...)
# Fallback: if nothing interesting is being done with the traits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this comment can likely be removed.

while(x_state !== nothing)
x, state = x_state
length(x)==dim1_size || throw(DimensionMismatch("hcat"))
copyto!(ret, offset, x, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this will fit into ret?

julia> reduce(hcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't this function needs to be tightenned to

T::Type{<:AbstractVector{S}}

And things that fell back though @default_eltype need to be banned from using it.

And then a nother (marginally slower, but benchmarking will tell)
version needs to be created that can deal with hetrogenous containers.
I would say that could be in another PR and I just want to common case covered here,
but actually i would be sad if can't get speedup for generators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that could be in another PR and I just want to common case covered here,

I don't understand, this PR breaks the use cases I linked so how can it be in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was unclear.
I mean definitately fixing that issue.
But the question of handling things like

(i for i in ([0x1, 0x2], [1,2]))

does not have to be handled, in this PR (I think it should, if it doesn't add undue complexity).
If we just tighten the type signature,
from where T<:AbstractVector to where T<:AbstractVector{S} where S.
That would solve your case by causing it to fallback to the current slow reduce methods.

Thus working out how to actually efficently handle hetrogeneous types could be in a PR;
if it was going to be hard.
But I don't think it will so it can be in this one.


while(x_state !== nothing)
x, state = x_state
append!(ret_vec, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this will fit into ret?

julia> reduce(vcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredrikekre fredrikekre added collections Data structures holding multiple items, e.g. sets performance Must go faster labels Apr 11, 2019
@oxinabox
Copy link
Contributor Author

Now with sizehinting back down if made it too large.
This applies to reduce(vcat, ) where we know the size of the iterater.
To keep performance i tweaked how the heuristic works.

  • In the ideal case of guessing the size right(ish) you still get basically same performance as running on Array{Vector}.
  • Right or wrong, if out guess at the size SIZEHINTCAP, then we fallback to acting the same as if we did not know the iterator size.
  • Otherwise, depending on how representative the first element was, this ranged between 50% - 150% the amount of time taken by the unknown iterator case.

I think the sizehint heuristic is worth it. But I could be convinced otherwise.
Basically the performance boils down to: If you have the sizehint back down then it was better not to have hinted at all.
But this case is fairly rare, since it means the first element was >2x the average size,
and the estimated size based on it was < sizehint cap.

Edgecase Benchmarks related to this case.
Note the neoreduce_nohintdown is what happens if we just never sizehint back down.
The choice to be made is between the first number in each set (which uses SizeUnknown behavour regardless of if the size is known), and the second, which uses the sizehint heuristic including hinting back down.

Perfect estimate, under SIZEHINTCAP

  data = [rand(50), rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50)]
  @btime neoreduce_hintdown(vcat, $data, Base.SizeUnknown()); # Not hint in he first place
  @btime neoreduce_hintdown(vcat, $data, Base.HasLength());
  @btime neoreduce_nohintdown(vcat, $data, Base.HasLength());

    724.560 ns (5 allocations: 12.38 KiB)
    379.985 ns (2 allocations: 4.86 KiB)
    389.140 ns (2 allocations: 4.86 KiB)

Perfect estimate, over SIZEHINT CAP

  data = [rand(50) for ii in 1:100_000]
  @btime neoreduce_hintdown(vcat, $data, Base.SizeUnknown()); # Not hint in he first place
  @btime neoreduce_hintdown(vcat, $data, Base.HasLength());
  @btime neoreduce_nohintdown(vcat, $data, Base.HasLength());
    19.343 ms (18 allocations: 51.56 MiB)
    18.740 ms (18 allocations: 51.56 MiB)
    19.134 ms (8 allocations: 49.59 MiB)

  @btime reduce(vcat, data);
    9.196 ms (2 allocations: 38.15 MiB)

Maxing out SIZEHINT cap, according to our massice over estimate,

  data = [rand(3*10^5), rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50)]
  @btime neoreduce_hintdown(vcat, $data, Base.SizeUnknown()); # Not hint in he first place
  @btime neoreduce_hintdown(vcat, $data, Base.HasLength());
  @btime neoreduce_nohintdown(vcat, $data, Base.HasLength());

    243.261 μs (3 allocations: 4.58 MiB)
    243.798 μs (3 allocations: 4.58 MiB)
    239.252 μs (3 allocations: 4.58 MiB)

  @btime reduce(vcat, data);

    130.166 μs (2 allocations: 2.29 MiB)

Under cap, but over estimated, enough to hintdown

  data = [rand(3*10^3), rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50)]
  @btime neoreduce_hintdown(vcat, $data, Base.SizeUnknown()); # Not hint in he first place
  @btime neoreduce_hintdown(vcat, $data, Base.HasLength());
  @btime neoreduce_nohintdown(vcat, $data, Base.HasLength());

    1.892 μs (3 allocations: 46.95 KiB)
    2.940 μs (4 allocations: 257.89 KiB)
    1.388 μs (3 allocations: 257.89 KiB)

Under cap, but over estimated, but not enough to hintdown

  data = [rand(70), rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50),rand(50)]
  @btime neoreduce_hintdown(vcat, $data, Base.SizeUnknown()); # Not hint in he first place
  @btime neoreduce_hintdown(vcat, $data, Base.HasLength());
  @btime neoreduce_nohintdown(vcat, $data, Base.HasLength());

    831.905 ns (5 allocations: 17.30 KiB)
    380.134 ns (2 allocations: 6.78 KiB)
    379.488 ns (2 allocations: 6.78 KiB)

base/reduce.jl Outdated
x_state = iterate(xs, state)
end

if length(ret) < hinted_size/2 # it is only allowable to be at most 2x to much memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of this conditional? sizehint! already has heuristics for when shrinking the capacity is worth i.

Copy link
Contributor Author

@oxinabox oxinabox Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those heuristics are less generious than this, though.
They are require saving an 8th, rather than a half

julia/src/array.c

Line 1108 in 68db871

//if we don't save at least an eighth of maxsize then its not worth it to shrink

base/reduce.jl Outdated
x_state === nothing && return T() # New empty instance
x1, state = x_state

ret = copy(x1) # this is **Not** going to work for StaticArrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So did this use to work for StaticArrays and is breaking? What should be done to resolve this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it did, used to work, though it used to work a little weirdly.
Since the result type depended if the iterator was a Array or not.

julia> data = [(@SVector Int[1,2,3,4]), @SVector Int[1,2,3,4]]
2-element Array{SArray{Tuple{4},Int64,1,4},1}:
 [1, 2, 3, 4]
 [1, 2, 3, 4]

julia> reduce(vcat, data)
8-element Array{Int64,1}:
 1
 2
 3
 4
 1
 2
 3
 4

julia> reduce(vcat, (i for i in data))
8-element SArray{Tuple{8},Int64,1,8}:
 1
 2
 3
 4
 1
 2
 3
 4

We should do something to support it.
But I wasn't sure what, so I left the comment (It should have had a #TODO)

  1. We can tighten this to apply only to Vector not to AbstractVector
  2. We can check for ismutable and if not mutable, we can fallback to standard reduce
  3. We can check for ismutable and if not mutable, we can fall back to using a Array for the return type

All 3 options leave it open to the package to define there own improved method for this, on there type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 would be too bad. 3 would break the reduce interface. So 2 sounds like the best solution.

base/reduce.jl Outdated
function reduce(::typeof(hcat), xs, T::Type{<:AbstractVector}, isize)
# Size is known
x_state = iterate(xs)
x_state === nothing && return T() # New empty instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of the comment. No need to describe what the code does in words.

base/reduce.jl Outdated

function reduce(::typeof(vcat), xs, T::Type{<:AbstractVector}, isize)
x_state = iterate(xs)
x_state === nothing && return T() # New empty instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just throw an ArgumentError as currently (BTW I'm not even sure the AbstractArray interface actually guarantees that T() works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an empty_reduce function that throws that error; unless it knows something better to do.
but yes.

end

x_state = iterate(xs, state)
while(x_state !== nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while(x_state !== nothing)
while x_state !== nothing

base/reduce.jl Outdated
x_state === nothing && return T() # New empty instance
x1, state = x_state

ret = copy(x1) # this is **Not** going to work for StaticArrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 would be too bad. 3 would break the reduce interface. So 2 sounds like the best solution.

base/reduce.jl Show resolved Hide resolved
base/reduce.jl Outdated

## vcat

function reduce(::typeof(vcat), xs, T::Type{<:AbstractVector}, isize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be restricted to the case where all vectors are of the same concrete type. Otherwise there is no guaranty that calling vcat repeatedly (what reduce does by default) will return a vector of the same type as the first one.

@StefanKarpinski StefanKarpinski added the forget me not PRs that one wants to make sure aren't forgotten label Aug 12, 2019
@vtjnash vtjnash removed the forget me not PRs that one wants to make sure aren't forgotten label Oct 28, 2020
@vtjnash
Copy link
Member

vtjnash commented Oct 28, 2020

Removed label since it doesn't seem like this was being worked on anymore. It seems relatively complex, and possibly is just suggesting that our growth strategy isn't sufficient for small numbers (along with large ones, refs #28588)?

@oxinabox
Copy link
Contributor Author

I should return to this, and just remove the growth stuff that people didn't like, and do the more minimal version for the case we know about.

@vtjnash
Copy link
Member

vtjnash commented Oct 27, 2023

Hoping this is covered by moving push! out of C into Julia in #51319

@vtjnash vtjnash closed this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants