Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster mapreduce for Broadcasted #31020

Merged
merged 12 commits into from
Apr 30, 2020
36 changes: 26 additions & 10 deletions base/broadcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ BroadcastStyle(a::AbstractArrayStyle{M}, ::DefaultArrayStyle{N}) where {M,N} =
# methods that instead specialize on `BroadcastStyle`,
# copyto!(dest::AbstractArray, bc::Broadcasted{MyStyle})

struct Broadcasted{Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple}
struct Broadcasted{Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple} <: Base.AbstractBroadcasted
f::F
args::Args
axes::Axes # the axes of the resulting object (may be bigger than implied by `args` if this is nested inside a larger `Broadcasted`)
Expand All @@ -193,21 +193,25 @@ function Base.show(io::IO, bc::Broadcasted{Style}) where {Style}
end

## Allocating the output container
Base.similar(bc::Broadcasted{DefaultArrayStyle{N}}, ::Type{ElType}) where {N,ElType} =
similar(Array{ElType}, axes(bc))
Base.similar(bc::Broadcasted{DefaultArrayStyle{N}}, ::Type{Bool}) where N =
similar(BitArray, axes(bc))
Base.similar(bc::Broadcasted, ::Type{T}) where {T} = similar(bc, T, axes(bc))
Base.similar(::Broadcasted{DefaultArrayStyle{N}}, ::Type{ElType}, dims) where {N,ElType} =
similar(Array{ElType}, dims)
Base.similar(::Broadcasted{DefaultArrayStyle{N}}, ::Type{Bool}, dims) where N =
similar(BitArray, dims)
# In cases of conflict we fall back on Array
Base.similar(bc::Broadcasted{ArrayConflict}, ::Type{ElType}) where ElType =
similar(Array{ElType}, axes(bc))
Base.similar(bc::Broadcasted{ArrayConflict}, ::Type{Bool}) =
similar(BitArray, axes(bc))
Base.similar(::Broadcasted{ArrayConflict}, ::Type{ElType}, dims) where ElType =
similar(Array{ElType}, dims)
Base.similar(::Broadcasted{ArrayConflict}, ::Type{Bool}, dims) =
similar(BitArray, dims)

@inline Base.axes(bc::Broadcasted) = _axes(bc, bc.axes)
_axes(::Broadcasted, axes::Tuple) = axes
@inline _axes(bc::Broadcasted, ::Nothing) = combine_axes(bc.args...)
_axes(bc::Broadcasted{<:AbstractArrayStyle{0}}, ::Nothing) = ()

@inline Base.axes(bc::Broadcasted{<:Any, <:NTuple{N}}, d::Integer) where N =
d <= N ? axes(bc)[d] : OneTo(1)

BroadcastStyle(::Type{<:Broadcasted{Style}}) where {Style} = Style()
BroadcastStyle(::Type{<:Broadcasted{S}}) where {S<:Union{Nothing,Unknown}} =
throw(ArgumentError("Broadcasted{Unknown} wrappers do not have a style assigned"))
Expand All @@ -219,6 +223,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
_eachindex(t::Tuple{Any}) = t[1]
_eachindex(t::Tuple) = CartesianIndices(t)

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
Base.IndexStyle(::Type{<:Broadcasted{<:Any,<:Tuple{Any}}}) = IndexLinear()
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out my previous approach (combine IndexStyle of each argument) was wrong since it didn't work with, e.g., broadcasted(+, zeros(5), zeros(1, 4)). Each argument can be IndexLinear even though the broadcasted indexing is not (that's the main job of broadcasted).

So now the new approach is to return an equivalent of IndexStyle(bc.axes[1]) if length(bc.axes) == 1 and then return IndexCartesian() otherwise. As I need Axes type parameter now, this implementation requires the Broadcasted to be instantiated first. I guess this is OK since that's the usual broadcasting pipeline.

Is this implementation fine? Is IndexStyle(bc.axes[1]) == IndexLinear() && length(bc.axes) == 1 the only possible IndexLinear() case that is detectable in a type-stable manner? That is to say, I'm ignoring the case like broadcasted(+, zeros(1, 4), zeros(1, 4)) as I need to look at the value to check that it supports linear indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much more accurate way to put it. I wanted to say that "for the broadcasted to be dispatched to the optimized method", it has to be instantiated first.


Base.LinearIndices(bc::Broadcasted{<:Any,<:Tuple{Any}}) = axes(bc)[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am right that this usually won't return a LinearIndices? Should this rather be LinearIndices(axes(bc))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that'd probably be better. This sort of specialization should probably be on eachindex(::IndexLinear, ...) or LinearIndices itself if it's even needed at all.


Base.ndims(::Broadcasted{<:Any,<:NTuple{N,Any}}) where {N} = N
Base.ndims(::Type{<:Broadcasted{<:Any,<:NTuple{N,Any}}}) where {N} = N

Expand Down Expand Up @@ -557,7 +567,13 @@ end
@boundscheck checkbounds(bc, I)
@inbounds _broadcast_getindex(bc, I)
end
Base.@propagate_inbounds Base.getindex(bc::Broadcasted, i1::Integer, i2::Integer, I::Integer...) = bc[CartesianIndex((i1, i2, I...))]
Base.@propagate_inbounds Base.getindex(
bc::Broadcasted,
i1::Union{Integer,CartesianIndex},
i2::Union{Integer,CartesianIndex},
I::Union{Integer,CartesianIndex}...,
) =
bc[CartesianIndex((i1, i2, I...))]
Base.@propagate_inbounds Base.getindex(bc::Broadcasted) = bc[CartesianIndex(())]

@inline Base.checkbounds(bc::Broadcasted, I::Union{Integer,CartesianIndex}) =
Expand Down
20 changes: 12 additions & 8 deletions base/reduce.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ else
const SmallUnsigned = Union{UInt8,UInt16,UInt32}
end

abstract type AbstractBroadcasted end
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to define this AbstractBroadcasted as broadcast.jl is loaded later than reduce.jl. ATM it's just a hack to make this work, but I wonder if it makes sense to define

abstract type AbstractIndexable end
const Indexable = Union{AbstractArray, AbstractIndexable}

or even

abstract type Indexable{N} end
abstract type AbstractArray{T,N} <: Indexable{N} end

so that it's much easier for downstream projects to use indexing-based mapreduce implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an abstract type or a trait like Indexable would also be useful to support mapreduce over dimensions with skipmissing (#28027). It would avoid duplicating some methods.

Though a trait would probably be better than an abstract type, since it would be more flexible. In particular, an object returned by skipmissing could be marked as Indexable or not depending on whether it wraps an Indexable object (like an array) or another iterable. It would also allow objects that already inherit from a non-indexable abstract type to implement Indexable if they want, which wouldn't be possible with an abstract type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with trait direction, one option is to use IndexStyle. Something like this:

struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differing the whole designing work for the trait makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash mentioned there may be another way around this bootstrapping issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the 1.4 feature freeze is approaching, I support merging the PR with the Union for now, and try to use a more general trait later.

FWIW, this PR is giving us incredibly good performance to be able to reuse the pairwise summation algorithm with weights at JuliaStats/StatsBase.jl#518. Great work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for testing this PR in a real-world scenario!


"""
Base.add_sum(x, y)

Expand Down Expand Up @@ -144,7 +147,8 @@ foldr(op, itr; kw...) = mapfoldr(identity, op, itr; kw...)

# This is a generic implementation of `mapreduce_impl()`,
# certain `op` (e.g. `min` and `max`) may have their own specialized versions.
@noinline function mapreduce_impl(f, op, A::AbstractArray, ifirst::Integer, ilast::Integer, blksize::Int)
@noinline function mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted,
ifirst::Integer, ilast::Integer, blksize::Int)
if ifirst == ilast
@inbounds a1 = A[ifirst]
return mapreduce_first(f, op, a1)
Expand All @@ -167,7 +171,7 @@ foldr(op, itr; kw...) = mapfoldr(identity, op, itr; kw...)
end
end

mapreduce_impl(f, op, A::AbstractArray, ifirst::Integer, ilast::Integer) =
mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted, ifirst::Integer, ilast::Integer) =
mapreduce_impl(f, op, A, ifirst, ilast, pairwise_blocksize(f, op))

"""
Expand Down Expand Up @@ -292,13 +296,13 @@ The default is `reduce_first(op, f(x))`.
"""
mapreduce_first(f, op, x) = reduce_first(op, f(x))

_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A)
_mapreduce(f, op, A::AbstractArrayOrBroadcasted) = _mapreduce(f, op, IndexStyle(A), A)

function _mapreduce(f, op, ::IndexLinear, A::AbstractArray{T}) where T
function _mapreduce(f, op, ::IndexLinear, A::AbstractArrayOrBroadcasted)
inds = LinearIndices(A)
n = length(inds)
if n == 0
return mapreduce_empty(f, op, T)
return mapreduce_empty_iter(f, op, A, IteratorEltype(A))
elseif n == 1
@inbounds a1 = A[first(inds)]
return mapreduce_first(f, op, a1)
Expand All @@ -319,7 +323,7 @@ end

mapreduce(f, op, a::Number) = mapreduce_first(f, op, a)

_mapreduce(f, op, ::IndexCartesian, A::AbstractArray) = mapfoldl(f, op, A)
_mapreduce(f, op, ::IndexCartesian, A::AbstractArrayOrBroadcasted) = mapfoldl(f, op, A)

"""
reduce(op, itr; [init])
Expand Down Expand Up @@ -469,7 +473,7 @@ isgoodzero(::typeof(max), x) = isbadzero(min, x)
isgoodzero(::typeof(min), x) = isbadzero(max, x)

function mapreduce_impl(f, op::Union{typeof(max), typeof(min)},
A::AbstractArray, first::Int, last::Int)
A::AbstractArrayOrBroadcasted, first::Int, last::Int)
a1 = @inbounds A[first]
v1 = mapreduce_first(f, op, a1)
v2 = v3 = v4 = v1
Expand Down Expand Up @@ -765,7 +769,7 @@ function count(pred, itr)
end
return n
end
function count(pred, a::AbstractArray)
function count(pred, a::AbstractArrayOrBroadcasted)
n = 0
for i in eachindex(a)
@inbounds n += pred(a[i])::Bool
Expand Down
37 changes: 22 additions & 15 deletions base/reducedim.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ No method is implemented for reducing index range of type $typeof(i). Please imp
reduced_index for this index type or report this as an issue.
"""
))
reduced_indices(a::AbstractArray, region) = reduced_indices(axes(a), region)
reduced_indices(a::AbstractArrayOrBroadcasted, region) = reduced_indices(axes(a), region)

# for reductions that keep 0 dims as 0
reduced_indices0(a::AbstractArray, region) = reduced_indices0(axes(a), region)
Expand Down Expand Up @@ -89,8 +89,8 @@ for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
end

# reducedim_initarray is called by
reducedim_initarray(A::AbstractArray, region, init, ::Type{R}) where {R} = fill!(similar(A,R,reduced_indices(A,region)), init)
reducedim_initarray(A::AbstractArray, region, init::T) where {T} = reducedim_initarray(A, region, init, T)
reducedim_initarray(A::AbstractArrayOrBroadcasted, region, init, ::Type{R}) where {R} = fill!(similar(A,R,reduced_indices(A,region)), init)
reducedim_initarray(A::AbstractArrayOrBroadcasted, region, init::T) where {T} = reducedim_initarray(A, region, init, T)

# TODO: better way to handle reducedim initialization
#
Expand Down Expand Up @@ -156,8 +156,8 @@ end
reducedim_init(f::Union{typeof(abs),typeof(abs2)}, op::typeof(max), A::AbstractArray{T}, region) where {T} =
reducedim_initarray(A, region, zero(f(zero(T))), _realtype(f, T))

reducedim_init(f, op::typeof(&), A::AbstractArray, region) = reducedim_initarray(A, region, true)
reducedim_init(f, op::typeof(|), A::AbstractArray, region) = reducedim_initarray(A, region, false)
reducedim_init(f, op::typeof(&), A::AbstractArrayOrBroadcasted, region) = reducedim_initarray(A, region, true)
reducedim_init(f, op::typeof(|), A::AbstractArrayOrBroadcasted, region) = reducedim_initarray(A, region, false)

# specialize to make initialization more efficient for common cases

Expand All @@ -179,8 +179,11 @@ end

## generic (map)reduction

has_fast_linear_indexing(a::AbstractArray) = false
has_fast_linear_indexing(a::AbstractArrayOrBroadcasted) = false
has_fast_linear_indexing(a::Array) = true
has_fast_linear_indexing(::Number) = true # for Broadcasted
has_fast_linear_indexing(bc::Broadcast.Broadcasted) =
all(has_fast_linear_indexing, bc.args)

function check_reducedims(R, A)
# Check whether R has compatible dimensions w.r.t. A for reduction
Expand Down Expand Up @@ -233,7 +236,7 @@ _firstslice(i::OneTo) = OneTo(1)
_firstslice(i::Slice) = Slice(_firstslice(i.indices))
_firstslice(i) = i[firstindex(i):firstindex(i)]

function _mapreducedim!(f, op, R::AbstractArray, A::AbstractArray)
function _mapreducedim!(f, op, R::AbstractArray, A::AbstractArrayOrBroadcasted)
lsiz = check_reducedims(R,A)
isempty(A) && return R

Expand Down Expand Up @@ -271,10 +274,10 @@ function _mapreducedim!(f, op, R::AbstractArray, A::AbstractArray)
return R
end

mapreducedim!(f, op, R::AbstractArray, A::AbstractArray) =
mapreducedim!(f, op, R::AbstractArray, A::AbstractArrayOrBroadcasted) =
(_mapreducedim!(f, op, R, A); R)

reducedim!(op, R::AbstractArray{RT}, A::AbstractArray) where {RT} =
reducedim!(op, R::AbstractArray{RT}, A::AbstractArrayOrBroadcasted) where {RT} =
mapreducedim!(identity, op, R, A)

"""
Expand Down Expand Up @@ -304,17 +307,21 @@ julia> mapreduce(isodd, |, a, dims=1)
1 1 1 1
```
"""
mapreduce(f, op, A::AbstractArray; dims=:, kw...) = _mapreduce_dim(f, op, kw.data, A, dims)
mapreduce(f, op, A::AbstractArray...; kw...) = reduce(op, map(f, A...); kw...)
mapreduce(f, op, A::AbstractArrayOrBroadcasted; dims=:, kw...) =
_mapreduce_dim(f, op, kw.data, A, dims)
mapreduce(f, op, A::AbstractArrayOrBroadcasted...; kw...) =
reduce(op, map(f, A...); kw...)

_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, ::Colon) = mapfoldl(f, op, A; nt...)
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArrayOrBroadcasted, ::Colon) =
mapfoldl(f, op, A; nt...)

_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, ::Colon) = _mapreduce(f, op, IndexStyle(A), A)
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArrayOrBroadcasted, ::Colon) =
_mapreduce(f, op, IndexStyle(A), A)

_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArray, dims) =
_mapreduce_dim(f, op, nt::NamedTuple{(:init,)}, A::AbstractArrayOrBroadcasted, dims) =
mapreducedim!(f, op, reducedim_initarray(A, dims, nt.init), A)

_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArray, dims) =
_mapreduce_dim(f, op, ::NamedTuple{()}, A::AbstractArrayOrBroadcasted, dims) =
mapreducedim!(f, op, reducedim_init(f, op, A, dims), A)

"""
Expand Down
56 changes: 56 additions & 0 deletions test/broadcast.jl
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,7 @@ end
# Broadcasted iterable/indexable APIs
let
bc = Broadcast.instantiate(Broadcast.broadcasted(+, zeros(5), 5))
@test IndexStyle(bc) == IndexLinear()
@test eachindex(bc) === Base.OneTo(5)
@test length(bc) === 5
@test ndims(bc) === 1
Expand All @@ -812,6 +813,7 @@ let
@test ndims(copy(bc)) == ndims([v for v in bc]) == ndims(collect(bc)) == ndims(bc)

bc = Broadcast.instantiate(Broadcast.broadcasted(+, zeros(5), 5*ones(1, 4)))
@test IndexStyle(bc) == IndexCartesian()
@test eachindex(bc) === CartesianIndices((Base.OneTo(5), Base.OneTo(4)))
@test length(bc) === 20
@test ndims(bc) === 2
Expand All @@ -832,6 +834,60 @@ let a = rand(5), b = rand(5), c = copy(a)
@test x == [2]
end

@testset "broadcasted mapreduce" begin
xs = 1:10
ys = 1:2:20
bc = Broadcast.instantiate(Broadcast.broadcasted(*, xs, ys))
@test IndexStyle(bc) == IndexLinear()
@test sum(bc) == mapreduce(Base.splat(*), +, zip(xs, ys))

xs2 = reshape(xs, 1, :)
ys2 = reshape(ys, 1, :)
bc = Broadcast.instantiate(Broadcast.broadcasted(*, xs2, ys2))
@test IndexStyle(bc) == IndexCartesian()
@test sum(bc) == mapreduce(Base.splat(*), +, zip(xs, ys))

xs = 1:5:3*5
ys = 1:4:3*4
bc = Broadcast.instantiate(
Broadcast.broadcasted(iseven, Broadcast.broadcasted(-, xs, ys)))
@test count(bc) == count(iseven, map(-, xs, ys))

xs = reshape(1:6, (2, 3))
ys = 1:2
bc = Broadcast.instantiate(Broadcast.broadcasted(*, xs, ys))
@test reduce(+, bc; dims=1, init=0) == [5 11 17]

# Let's test that `Broadcasted` actually hits the efficient
# `mapreduce` method as intended. We are going to invoke `reduce`
# with this *NON-ASSOCIATIVE* binary operator to see what
# associativity is chosen by the implementation:
paren = (x, y) -> "($x,$y)"
# Next, we construct data `xs` such that `length(xs)` is greater
# than short array cutoff of `_mapreduce`:
alphabets = 'a':'z'
blksize = Base.pairwise_blocksize(identity, paren) ÷ length(alphabets)
xs = repeat(alphabets, 2 * blksize)
@test length(xs) > blksize
# So far we constructed the data `xs` and reducing function
# `paren` such that `reduce` and `foldl` results are different.
# That is to say, this `reduce` does not hit the fall-back `foldl`
# branch:
@test foldl(paren, xs) != reduce(paren, xs)

# Now let's try it with `Broadcasted`:
bcraw = Broadcast.broadcasted(identity, xs)
bc = Broadcast.instantiate(bcraw)
# If `Broadcasted` has `IndexLinear` style, it should hit the
# `reduce` branch:
@test IndexStyle(bc) == IndexLinear()
@test reduce(paren, bc) == reduce(paren, xs)
# If `Broadcasted` does not have `IndexLinear` style, it should
# hit the `foldl` branch:
@test IndexStyle(bcraw) == IndexCartesian()
@test reduce(paren, bcraw) == foldl(paren, xs)
end

# treat Pair as scalar:
@test replace.(split("The quick brown fox jumps over the lazy dog"), r"[aeiou]"i => "_") ==
["Th_", "q__ck", "br_wn", "f_x", "j_mps", "_v_r", "th_", "l_zy", "d_g"]