Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable lazy fused broadcasting in pure Julia #26891

Merged
merged 7 commits into from
Apr 26, 2018

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Apr 24, 2018

This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, closes #23692, closes #25377 by superseding them).

What does it do?

This introduces a pure Julia data structure that represents a fused broadcast
expression. For example, the expression 2 .* (x .+ 1) lowers to:

julia> Meta.@lower 2 .* (x .+ 1)
:($(Expr(:thunk, CodeInfo(:(begin
      Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize)
      Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :broadcasted)
      Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :broadcasted)
      Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1)
      Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3))
      Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4))
      return Core.SSAValue(5)
  end)))))

Or, slightly more readably as:

using .Broadcast: materialize, broadcasted
materialize(broadcasted(*, 2, broadcasted(+, x, 1)))

The broadcasted function serves two purposes. Its primary purpose is to
construct the Broadcast.Broadcasted objects that hold onto the function, the
tuple of arguments (potentially including nested Broadcasted arguments), and
sometimes a set of axes to include knowledge of the outer shape. The
secondary purpose, however, is to allow an "out" for objects that don't want
to participate in fusion. For example, if x is a range in the above 2 .* (x .+ 1) expression, it needn't allocate an array and operate elementwise — it
can just compute and return a new range. Thus custom structures are able to
specialize Broadcast.broadcasted(f, args...) just as they'd specialize on f
normally to return an immediate result.

Broadcast.materialize is identity for everything except Broadcasted
objects for which it allocates an appropriate result and computes the
broadcast. It does two things: it initializes the outermost Broadcasted
object to compute its axes and then copys it.

Similarly, an in-place fused broadcast like y .= 2 .* (x .+ 1) uses the exact
same expression tree to compute the right-hand side of the expression as above,
and then uses materialize!(y, broadcasted(*, 2, broadcasted(+, x, 1))) to instantiate the
Broadcasted expression tree and then copyto! it into the given destination.

All-together, this forms a complete API for custom types to extend and
customize the behavior of broadcast (fixes #22060). It uses the existing
BroadcastStyles throughout to simplify dispatch on many arguments:

  • Custom types can opt-out of broadcast fusion by specializing
    Broadcast.broadcasted(f, args...) or Broadcast.broadcasted(::BroadcastStyle, f, args...).

  • The Broadcasted object computes and stores the type of the combined
    BroadcastStyle of its arguments as its first type parameter, allowing for
    easy dispatch and specialization.

  • Custom Broadcast storage is still allocated via broadcast_similar, however
    instead of passing just a function as a first argument, the entire
    Broadcasted object is passed as a final argument. This potentially allows
    for much more runtime specialization dependent upon the exact expression
    given.

  • Custom broadcast implmentations for a CustomStyle are defined by
    specializing copy(bc::Broadcasted{CustomStyle}) or
    copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle}).

  • Fallback broadcast specializations for a given output object of type Dest
    (for the DefaultArrayStyle or another such style that hasn't implemented
    assignments into such an object) are defined by specializing
    copyto(dest::Dest, bc::Broadcasted{Nothing}).

How do base and the standard libraries take advantage?

As it fully supports range broadcasting, this now deprecates (1:5) + 2 to
.+, just as had been done for all AbstractArrays in general.

BitArray is able to exploit this new structure to great effect. When broadcasting
into a BitArray, it is able to scan through the Broadcasted expression tree to
see if it'd be able to support fusing at the level of chunks, operating on 64 bits
at a time. This is a huge performance gain on the order of 100x or more. The
simple @benchmark $A .& $B .| .!$C demonstrates a 130x performance gain
over master for A, B, C = bitrand.((1000,1000,1000)) (50ns vs 6500ns). The
numbers are even greater for larger structures.

As a first-mover proof of concept, LinearAlgebra uses this new system to
improve broadcasting over structured arrays. Before, broadcasting over a
structured matrix would result in a sparse array. Now, broadcasting over a
structured matrix will either return an appropriately structured matrix or
a dense array. This does incur a type instability (in the form of a
discriminated union) in some situations, but thanks to type-based introspection
of the Broadcasted wrapper commonly used functions can be special cased to be
type stable. For example:

julia> f(d) = round.(Int, d)
f (generic function with 1 method)

julia> @inferred f(Diagonal(rand(3)))
3×3 Diagonal{Int64,Array{Int64,1}}:
 0    
   0  
     1

julia> @inferred Diagonal(rand(3)) .* 3
ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope

julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3)
4×4 Tridiagonal{Float64,Array{Float64,1}}:
 1.30771  0.838589             
 0.0      3.89109   0.0459757   
         0.0       4.48033    2.51508
                  0.0        6.23739

Sparse arrays already had fairly extensive broadcasting support, but it was all based around broadcasting a single function over a set of arguments. This PR implements broadcasting as an expression tree of potentially many nested functions and arguments, but to support this common case, a Broadcast.flatten function is provided to convert the expression tree into a single function and set of arguments. This allows custom implementations a way to opt into a simpler API in their implementation.

What more does it fix?

In addition to the issues referenced above, it fixes:

What's not to love?

(For bookkeeping, this is the squashed version of the teh-jn/lazydotfuse
branch as of a1d4e7e. Squashed and separated
out to make it easier to review and commit)

Co-authored-by: @timholy
Co-authored-by: @vtjnash
Co-authored-by: @ajkeller34

This patch represents the combined efforts of four individuals, over 60
commits, and an iterated design over (at least) three pull requests that
spanned nearly an entire year (closes #22063, #23692, #25377 by superceding
them).

This introduces a pure Julia data structure that represents a fused broadcast
expression.  For example, the expression `2 .* (x .+ 1)` lowers to:

```julia
julia> Meta.@lower 2 .* (x .+ 1)
:($(Expr(:thunk, CodeInfo(:(begin
      Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize)
      Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make)
      Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make)
      Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1)
      Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3))
      Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4))
      return Core.SSAValue(5)
  end)))))
```

Or, slightly more readably as:

```julia
using .Broadcast: materialize, make
materialize(make(*, 2, make(+, x, 1)))
```

The `Broadcast.make` function serves two purposes. Its primary purpose is to
construct the `Broadcast.Broadcasted` objects that hold onto the function, the
tuple of arguments (potentially including nested `Broadcasted` arguments), and
sometimes a set of `axes` to include knowledge of the outer shape. The
secondary purpose, however, is to allow an "out" for objects that _don't_ want
to participate in fusion. For example, if `x` is a range in the above `2 .* (x
.+ 1)` expression, it needn't allocate an array and operate elementwise — it
can just compute and return a new range. Thus custom structures are able to
specialize `Broadcast.make(f, args...)` just as they'd specialize on `f`
normally to return an immediate result.

`Broadcast.materialize` is identity for everything _except_ `Broadcasted`
objects for which it allocates an appropriate result and computes the
broadcast. It does two things: it `initialize`s the outermost `Broadcasted`
object to compute its axes and then `copy`s it.

Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact
same expression tree to compute the right-hand side of the expression as above,
and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the
`Broadcasted` expression tree and then `copyto!` it into the given destination.

All-together, this forms a complete API for custom types to extend and
customize the behavior of broadcast (fixes #22060). It uses the existing
`BroadcastStyle`s throughout to simplify dispatch on many arguments:

* Custom types can opt-out of broadcast fusion by specializing
  `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`.

* The `Broadcasted` object computes and stores the type of the combined
  `BroadcastStyle` of its arguments as its first type parameter, allowing for
  easy dispatch and specialization.

* Custom Broadcast storage is still allocated via `broadcast_similar`, however
  instead of passing just a function as a first argument, the entire
  `Broadcasted` object is passed as a final argument. This potentially allows
  for much more runtime specialization dependent upon the exact expression
  given.

* Custom broadcast implmentations for a `CustomStyle` are defined by
  specializing `copy(bc::Broadcasted{CustomStyle})` or
  `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`.

* Fallback broadcast specializations for a given output object of type `Dest`
  (for the `DefaultArrayStyle` or another such style that hasn't implemented
  assignments into such an object) are defined by specializing
  `copyto(dest::Dest, bc::Broadcasted{Nothing})`.

As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to
`.+`, just as had been done for all `AbstractArray`s in general.

As a first-mover proof of concept, LinearAlgebra uses this new system to
improve broadcasting over structured arrays. Before, broadcasting over a
structured matrix would result in a sparse array. Now, broadcasting over a
structured matrix will _either_ return an appropriately structured matrix _or_
a dense array. This does incur a type instability (in the form of a
discriminated union) in some situations, but thanks to type-based introspection
of the `Broadcasted` wrapper commonly used functions can be special cased to be
type stable.  For example:

```julia
julia> f(d) = round.(Int, d)
f (generic function with 1 method)

julia> @inferred f(Diagonal(rand(3)))
3×3 Diagonal{Int64,Array{Int64,1}}:
 0  ⋅  ⋅
 ⋅  0  ⋅
 ⋅  ⋅  1

julia> @inferred Diagonal(rand(3)) .* 3
ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope

julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3)
4×4 Tridiagonal{Float64,Array{Float64,1}}:
 1.30771  0.838589   ⋅          ⋅
 0.0      3.89109   0.0459757   ⋅
  ⋅       0.0       4.48033    2.51508
  ⋅        ⋅        0.0        6.23739
```

In addition to the issues referenced above, it fixes:

* Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated
  specially in a fused broadcast; they're just arguments in a `Broadcasted`
  object like everything else.

* Fixes #21094: Since broadcasting is now represented by a pure Julia
  datastructure it can be created within `@generated` functions and serialized.

* Fixes #26097: The fallback destination-array specialization method of
  `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not
  be confused by `nothing` arguments.

* Fixes the broadcast-specific element of #25499: The default base broadcast
  implementation no longer depends upon `Base._return_type` to allocate its
  array (except in the empty or concretely-type cases). Note that the sparse
  implementation (#19595) is still dependent upon inference and is _not_ fixed.

* Fixes #25340: Functions are treated like normal values just like arguments
  and only evaluated once.

* Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was
  fixed on master already, but this fixes it now, too.

* Fixes #25521.

* The performance of this patch has been thoroughly tested through its
  iterative development process in #25377. There remain [two classes of
  performance regressions](#25377) that Nanosoldier flagged.

* #25691: Propagation of constant literals sill lose their constant-ness upon
  going through the broadcast machinery. I believe quite a large number of
  functions would need to be marked as `@pure` to support this -- including
  functions that are intended to be specialized.

(For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377)
branch as of a1d4e7e. Squashed and separated
out to make it easier to review and commit)

Co-authored-by: Tim Holy <tim.holy@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>
@vtjnash
Copy link
Member

vtjnash commented Apr 24, 2018

Yay!

Custom types can opt-out of broadcast fusion by specializing
Broadcast.make(f, args...) or Broadcast.make(::BroadcastStyle, f, args...).

Won’t these two signatures lead to confusion if someone tries to broadcast a BroadcastStyle as the first (function) argument

@vtjnash
Copy link
Member

vtjnash commented Apr 24, 2018

Propagation of constant literals

Possibly helped by #26826, but we can likely work towards modifying our heuristics, as needed, to permit analysis of these cases

@StefanKarpinski
Copy link
Member

This may be a record on closing issues with one PR! (Possibly only beat out by Jeff's type system revamp or adding the package manager, mostly because I just closed all package manager-related issues after that 😹.) My only nit to pick is that I do not love the name make. Maybe we could play the "explain what it does" game and try to figure out which word or phrase is appropriate? The names bcast or lazydot or dothold are the ones that come to mind immediately for me.

@mbauman
Copy link
Member Author

mbauman commented Apr 24, 2018

Won’t these two signatures lead to confusion if someone tries to broadcast a BroadcastStyle as the first (function) argument

Yes, but I don't find this all too terrible as BroadcastStyle subtypes themselves are not callable — you can still broadcast the type constructors. In exchange we get a simpler API for extension. That's a worthwhile tradeoff in my eyes.

My only nit to pick is that I do not love the name make.

That's fair. The whole point is to allow folks to override the Broadcasted constructor to return a non-Broadcasted thing that represents the same values. What about lowercase broadcasted?

@StefanKarpinski
Copy link
Member

👍 Lowercase broadcasted seems perfect.

@vtjnash
Copy link
Member

vtjnash commented Apr 24, 2018

alternately, broadcasting?

@mbauman
Copy link
Member Author

mbauman commented Apr 24, 2018

I like using the lowercased name of the type here. I'm open to other ideas, but I think we should think of the names Broadcasted and broadcasted in tandem. They both return something that represents the result of a broadcast operation — and the latter need not be lazy. Some types will want to opt-out of the laziness completely for some or all operations.

@mbauman mbauman added broadcast Applying a function over a collection and removed broadcast labels Apr 24, 2018
NEWS.md Outdated
@@ -418,6 +413,14 @@ This section lists changes that do not have deprecation warnings.
* `findn(x::AbstractArray)` has been deprecated in favor of `findall(!iszero, x)`, which
now returns cartesian indices for multidimensional arrays (see below, [#25532]).

* Broadcasting operations are no longer fused into a single operation by Julia's parser.
Instead, a lazy `Broadcasted` wrapper is created, and the parser will call
`copy(bc::Broadcasted)` or `copyto!(dest, bc::Broadcasted)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the lowering actually calls Broadcast.materialize or Broadcast.materialize!?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's true. I still want to highlight copy and copyto! as those are the methods to specialize. Let's just remove that "parser will call" to make it technically correct.

Copy link
Member

@stevengj stevengj Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the parser doesn't lower to these copy calls directly?

Copy link
Member Author

@mbauman mbauman Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There are several reasons here, and they're different for in-place (broadcast! / .=) vs. out-of-place broadcasting.

For in-place broadcasts, we use materialize! for two purposes: (1) the outermost broadcasted might not return a Broadcasted object, but we need to support broadcasting into the destination (e.g., copyto!(rand(3,3), 1:3) doesn't broadcast) and we want to use the same code-paths here. It solves both problems at once by wrapping the non-Broadcasted object x in a Broadcasted(identity, (x,)). (2) The Broadcasted argument wants to know about its axes, but those are dependent upon the axes of the destination. It explicitly sets the axes of the Broadcasted object to those of the destination, and then those are checked to ensure they are a compatible shape with instantiate.

For out-of-place broadcasts, materialize also serves two purposes: (1) for Broadcasted arguments, it computes and caches the axes of the outermost Broadcasted, recursively checking the axes of its arguments by instantiate-ing them. Now, we may be able to get away with not storing the axes since axes should theoretically only be called once and it can be computed on demand, but I haven't tested it and it seems strange to throw a DimensionError from a call to axes. (2) for all other arguments it defaults to identity but can serve as a hook for custom types to do similar sorts of specializations decoupled from copy.

I'm also aiming for some semblance of symmetry between the two cases.

Now there are two is one caveat here: (1) A wart that I really dislike is broadcast itself is hard-coded to call copy(instantiate(broadcasted(…))) instead of materialize. For reasons beyond my comprehension, materialize sometimes failed to inline even though it is explicitly marked as @inline. (Fixed in 8a2d88a — this hack is no longer necessary). (2) The expression A .= x still lowers to broadcast!(identity, A, x). If otherwise unspecialized, it will fall back to the default implementation, which constructs the appropriate Broadcasted object and then goes through the above machinery as expected. We may want to change this purely to be able to simplify our description of this parser transformation.

@stevengj
Copy link
Member

stevengj commented Apr 24, 2018

Yes, we see that capitalized/lowercase idiom elsewhere, and I think it makes sense here. See e.g. Iterators.Reverse vs. Iterators.reverse — the latter may construct the former, or may materialize some other iterator immediately.

@vtjnash
Copy link
Member

vtjnash commented Apr 24, 2018

Won’t these two signatures lead to confusion

Even so, it leads to some weird side-effects, such as:
Broadcast.Style{Tuple}().(sum, x, y, z)
being syntax for broadcasting just the arguments that are tuples to form a new tuple.

For reasons beyond my comprehensions, this previously failed to inline despite the at-inline. For reasons that are also beyond my comprehension, this hack is no longer necessary.
@mbauman
Copy link
Member Author

mbauman commented Apr 24, 2018

Nifty feature. :trollface:

Seriously, though, do you have any ideas on a replacement API here that is similarly easy to specialize? I simply don't find this all that troublesome.

by implementing a preprocessing step for chunked bitarray broadcast that converts boolean-only functions to ones that support chunk-wise operation. Also add a number of tests for this codepath
@mbauman
Copy link
Member Author

mbauman commented Apr 25, 2018

I've edited the first post to reflect the renaming of Broadcast.make to Broadcast.broadcasted and added a section on the new BitArray implementation, which is a great example of how powerful this new API is.

@mbauman
Copy link
Member Author

mbauman commented Apr 25, 2018

Ok, one last question from the now-hidden code comment: what should A .= x lower to?

  1. broadcast!(identity, A, x); simple and like it's always been, but totally unlike the fused RHS cases
  2. Broadcast.materialize!(A, x); this is what would happen if the RHS was a more complicated fusion but opted out of lazy fusion
  3. Broadcast.materialize!(A, Broadcast.broadcasted(identity, x)); this is what broadcast! is defined as. It's the most complicated, but it's also the most alike the fused RHS cases.

With no specializations beyond those in base, all are equivalent. I would prefer 1 or 3, but don't have terribly strong feelings beyond that.

@bramtayl
Copy link
Contributor

bramtayl commented Apr 25, 2018

Unrelated, but building on the capitalized/lowercase idiom, it might be worthwhile considering renaming Generator to Map.

@StefanKarpinski
Copy link
Member

Option 3 seems like it gives the most flexibility, right?

Now instead of lowering to `broadcast!(identity,A,x)`, it lowers to `Broadcast.materialize!(A, Broadcast.broadcasted(identity, x))`, with all names resolved in the top module.
@mbauman
Copy link
Member Author

mbauman commented Apr 25, 2018

Ok, yes, I do like this better. It reduces the number of names that lowering calls and should be simpler to document. It's still a special case — you still need to explain where that identity came from — but I think it's a worthwhile change.

@mbauman
Copy link
Member Author

mbauman commented Apr 25, 2018

How serious is your objection to the broadcasted (nee make) API, @vtjnash? Barring a change there I'm ready for this to be merged.

@vtjnash
Copy link
Member

vtjnash commented Apr 25, 2018

Nifty feature.

Actually, I'm coming around to agree. Let's merge! :)

@mbauman
Copy link
Member Author

mbauman commented Apr 26, 2018

Travis failure was #26902, AppVeyor x86_64 was a timeout.

@mbauman mbauman merged commit ef76d9c into master Apr 26, 2018
@mbauman mbauman deleted the mb-teh-jn-ajk/lazydotfuse branch April 26, 2018 14:57
@fredrikekre
Copy link
Member

🎉

omus added a commit to JuliaData/NamedTuples.jl that referenced this pull request Apr 27, 2018
Currently doesn't work due to the new changes from JuliaLang/julia#26891
omus added a commit to JuliaData/NamedTuples.jl that referenced this pull request Apr 30, 2018
Currently doesn't work due to the new changes from JuliaLang/julia#26891
omus added a commit to JuliaData/NamedTuples.jl that referenced this pull request Apr 30, 2018
Currently doesn't work due to the new changes from JuliaLang/julia#26891
@timholy
Copy link
Member

timholy commented May 13, 2018

I have not had any time for general development lately, and that's likely to continue for a little while yet. So just wanted to give a big shout-out to @mbauman and anyone else who pushed this over the finish line. Many thanks for seeing it through!

omus added a commit to JuliaData/NamedTuples.jl that referenced this pull request Sep 5, 2018
Currently doesn't work due to the new changes from JuliaLang/julia#26891
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment