-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimised methods for reducing(hcat/vcat on any iterators of vectors #31644
Conversation
base/reduce.jl
Outdated
if !(isize isa SizeUnknown) | ||
# Assume first element has representitive size, unless that would make this too large | ||
SIZEHINT_CAP = 10^6 | ||
sizehint!(ret, min(SIZEHINT_CAP, length(xs)*length(x1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a substatial point in speeding this up.
It is the reason knowing the size gives a speedup for vcat.
And it think it is a very common case that
the size of your vectors are all the same,
or when it is wrong that it is still a good
I am not sure what a good value for SIZEHINT_CAP
is.
We need it to catch cases that only fit in memory because
the first element is massive, and the rest much smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it makes a large difference? IIRC append!
is supposed to double the size of the storage to ensure resizing doesn't happen too often. At any rate, I don't think it's correct to assume the first element is representative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, especially on small cases it really matters, adding it got a 60% speedup.
one my testcast of 100 vectors of length 128.
Doubling just doesn't increase the capacity that much, if you are doubling a realitvely small number.
Consider if you have 100 vectors of equal length.
Then doubling results in, having to allocate 7 times.
Which is a lot, as a portion of the time you need to spend
So the case of thinking the first element is representative is
is a bit of a guess.
- If the guess is approximately right then you'll probably only end up doing 1 more allocation.
- If the guess is too low, then your basically back into the case of doubling, so you've lost nothing by taking it.
- If the guess is too large, this is the dangerous point, because it risks allocating a ton of memory that isn't needed.
The last case is where the SIZEHINT_CAP
comes in. It is our guard against that case. So we set it to some suitable number, I thought 10^6
might be OK, so that would be allocating 4MB if it was Float64
s. But we could do 10^5
if we anted to be more conservative.
The other thing is that once things are bigger than SIZEHINT_CAP
,
then the array size should be large enough that doubling is a very effective increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think allocating this much extra is reasonable so you need to sizehint!
back to the actual number in the end (which will reduce the capacity).
Also, posting the benchmarks is a good idea (unless they are the same as in the first post).
This feels like too much heuristic in a PR which otherwise would be quite simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark:
With code that bypasses normal array reduce:
data = [rand(max(ceil(Int,96+32(randn())), 0)) for ii in 1:1000]
- Timing with
@btime
, different computer from before. (probably a worse computer for benchmarking as it is a shared server, but still point holds) - Multiple rounds of testing to roughly catch on chance since we are using random length arrays
- round 1:
- with hint: 498.927 μs (2 allocations: 782.19 KiB),
- with out : 728.514 μs (11 allocations: 1.80 MiB)
- round2:
- with hint: 757.129 μs (4 allocations: 2.08 MiB)
- with out : 900.870 μs (13 allocations: 2.56 MiB)
- round3:
- with hint: 643.748 μs (3 allocations: 1.74 MiB)
- with out : 687.555 μs (11 allocations: 1.51 MiB)
Now of-course the advantage goes up if your inital guess at the size was wrong.
And once sizehint!
ing back down is added, that will also cut down the advantage in doing it.
But I think it will still be worth it
(can further add heuristics about not sizehinting down if <2x too large, since that is acceptable)
offset = length(x1)+1 | ||
while(x_state !== nothing) | ||
x, state = x_state | ||
length(x)==dim1_size || throw(DimensionMismatch("hcat")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include dimensions in error message
Errors given by the CI are correctt, looks like i broke at least one case for matrixes |
Ok, now the failing tests are not real. Some kind of distributed processing fault |
base/reduce.jl
Outdated
if !(isize isa SizeUnknown) | ||
# Assume first element has representitive size, unless that would make this too large | ||
SIZEHINT_CAP = 10^6 | ||
sizehint!(ret, min(SIZEHINT_CAP, length(xs)*length(x1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it makes a large difference? IIRC append!
is supposed to double the size of the storage to ensure resizing doesn't happen too often. At any rate, I don't think it's correct to assume the first element is representative.
reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...) | ||
function reduce(op, itr::T; kw...) where T | ||
# Redispatch, adding traits | ||
reduce(op, itr, eltype_or_default_eltype(itr), IteratorSize(T); kw...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a private _reduce
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unless it should be a private _mapreduce
method.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess _mapreduce
is better if you can support that, since that will also support mapreduce(identity, ...)
.
@@ -362,10 +367,95 @@ julia> reduce(*, [2; 3; 4]; init=-1) | |||
-24 | |||
``` | |||
""" | |||
reduce(op, itr; kw...) = mapreduce(identity, op, itr; kw...) | |||
function reduce(op, itr::T; kw...) where T | |||
# Redispatch, adding traits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this comment, it is literally what the code below it does.
end | ||
|
||
function reduce(op, itr, et, isize; kw...) | ||
# Fallback: if nothing interesting is being done with the traits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this comment can likely be removed.
while(x_state !== nothing) | ||
x, state = x_state | ||
length(x)==dim1_size || throw(DimensionMismatch("hcat")) | ||
copyto!(ret, offset, x, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know this will fit into ret
?
julia> reduce(hcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't this function needs to be tightenned to
T::Type{<:AbstractVector{S}}
And things that fell back though @default_eltype
need to be banned from using it.
And then a nother (marginally slower, but benchmarking will tell)
version needs to be created that can deal with hetrogenous containers.
I would say that could be in another PR and I just want to common case covered here,
but actually i would be sad if can't get speedup for generators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that could be in another PR and I just want to common case covered here,
I don't understand, this PR breaks the use cases I linked so how can it be in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was unclear.
I mean definitately fixing that issue.
But the question of handling things like
(i for i in ([0x1, 0x2], [1,2]))
does not have to be handled, in this PR (I think it should, if it doesn't add undue complexity).
If we just tighten the type signature,
from where T<:AbstractVector
to where T<:AbstractVector{S} where S
.
That would solve your case by causing it to fallback to the current slow reduce
methods.
Thus working out how to actually efficently handle hetrogeneous types could be in a PR;
if it was going to be hard.
But I don't think it will so it can be in this one.
|
||
while(x_state !== nothing) | ||
x, state = x_state | ||
append!(ret_vec, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know this will fit into ret
?
julia> reduce(vcat, (UInt8[1,2], [1000, 5000]))
ERROR: InexactError: trunc(UInt8, 1000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct we don't, see https://github.com/JuliaLang/julia/pull/31644/files#r274153197
Now with sizehinting back down if made it too large.
I think the sizehint heuristic is worth it. But I could be convinced otherwise. Edgecase Benchmarks related to this case. Perfect estimate, under SIZEHINTCAP
Perfect estimate, over SIZEHINT CAP
Maxing out SIZEHINT cap, according to our massice over estimate,
Under cap, but over estimated, enough to hintdown
Under cap, but over estimated, but not enough to hintdown
|
base/reduce.jl
Outdated
x_state = iterate(xs, state) | ||
end | ||
|
||
if length(ret) < hinted_size/2 # it is only allowable to be at most 2x to much memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of this conditional? sizehint!
already has heuristics for when shrinking the capacity is worth i.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those heuristics are less generious than this, though.
They are require saving an 8th, rather than a half
Line 1108 in 68db871
//if we don't save at least an eighth of maxsize then its not worth it to shrink |
base/reduce.jl
Outdated
x_state === nothing && return T() # New empty instance | ||
x1, state = x_state | ||
|
||
ret = copy(x1) # this is **Not** going to work for StaticArrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So did this use to work for StaticArrays and is breaking? What should be done to resolve this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it did, used to work, though it used to work a little weirdly.
Since the result type depended if the iterator was a Array
or not.
julia> data = [(@SVector Int[1,2,3,4]), @SVector Int[1,2,3,4]]
2-element Array{SArray{Tuple{4},Int64,1,4},1}:
[1, 2, 3, 4]
[1, 2, 3, 4]
julia> reduce(vcat, data)
8-element Array{Int64,1}:
1
2
3
4
1
2
3
4
julia> reduce(vcat, (i for i in data))
8-element SArray{Tuple{8},Int64,1,8}:
1
2
3
4
1
2
3
4
We should do something to support it.
But I wasn't sure what, so I left the comment (It should have had a #TODO
)
- We can tighten this to apply only to
Vector
not toAbstractVector
- We can check for
ismutable
and if not mutable, we can fallback to standardreduce
- We can check for
ismutable
and if not mutable, we can fall back to using aArray
for the return type
All 3 options leave it open to the package to define there own improved method for this, on there type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 would be too bad. 3 would break the reduce
interface. So 2 sounds like the best solution.
base/reduce.jl
Outdated
function reduce(::typeof(hcat), xs, T::Type{<:AbstractVector}, isize) | ||
# Size is known | ||
x_state = iterate(xs) | ||
x_state === nothing && return T() # New empty instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of the comment. No need to describe what the code does in words.
base/reduce.jl
Outdated
|
||
function reduce(::typeof(vcat), xs, T::Type{<:AbstractVector}, isize) | ||
x_state = iterate(xs) | ||
x_state === nothing && return T() # New empty instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just throw an ArgumentError
as currently (BTW I'm not even sure the AbstractArray
interface actually guarantees that T()
works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have an empty_reduce
function that throws that error; unless it knows something better to do.
but yes.
end | ||
|
||
x_state = iterate(xs, state) | ||
while(x_state !== nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while(x_state !== nothing) | |
while x_state !== nothing |
base/reduce.jl
Outdated
x_state === nothing && return T() # New empty instance | ||
x1, state = x_state | ||
|
||
ret = copy(x1) # this is **Not** going to work for StaticArrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 would be too bad. 3 would break the reduce
interface. So 2 sounds like the best solution.
base/reduce.jl
Outdated
|
||
## vcat | ||
|
||
function reduce(::typeof(vcat), xs, T::Type{<:AbstractVector}, isize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be restricted to the case where all vectors are of the same concrete type. Otherwise there is no guaranty that calling vcat
repeatedly (what reduce
does by default) will return a vector of the same type as the first one.
10f5294
to
0a193a6
Compare
Removed label since it doesn't seem like this was being worked on anymore. It seems relatively complex, and possibly is just suggesting that our growth strategy isn't sufficient for small numbers (along with large ones, refs #28588)? |
I should return to this, and just remove the growth stuff that people didn't like, and do the more minimal version for the case we know about. |
Hoping this is covered by moving |
This covers the 2 vector cases of #31636
which I think are the most important.
Basically making all iterators comparably performant for this as Arrays
Looking at it, I do wonder if they (including the existing ones from #21672)
should be pushed down to be specialisations of
mapreduce(identity, vcat/hcat,
.Not sure though.
Benchmarks:
Data
Covering each combination of iterator traits
Results:
Note the
reduce new:
entry bypasses the existing method forreduce(vcat|hcat, ::Array{<:AbstractVector})
,so that I could compare performance on just treating
Arrays
, asIterators
,and thus to see if we could drop the extra specialised methods for them.
(once another PR to do matrixes is complete)
Benchmark Takeways:
Array
svcat
on generators)