Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling broadcasts #31

Closed
MikeInnes opened this issue May 12, 2017 · 17 comments
Closed

Handling broadcasts #31

MikeInnes opened this issue May 12, 2017 · 17 comments

Comments

@MikeInnes
Copy link
Member

MikeInnes commented May 12, 2017

Using broadcasting operators in 0.6 gives deprecation warnings, and soon won't work at all as the .+ etc function objects are removed. We also need a more generic way to handle generic f.(xs) applications.

I suggest that DataFlow lowers any broadcast f.(xs...) to Broadcast(f)(xs...), where broadcast is simply a wrapper around f. Calls to broadcast can be appropriately overloaded, both in Julia code and in conversions to backends, as well as made to generate . calls again when lowered back to syntax.

DataFlow now just creates explicit broadcast calls as part of desugaring.

@staticfloat
Copy link
Contributor

This is an unusually annoying depwarn, because it seems to be triggered upon every invocation of a Flux call. :)

@MikeInnes
Copy link
Member Author

Yup. I've just been using --depwarn=no (as per usual) but should probably sort this out ASAP.

@stevengj
Copy link

You should just stop overloading .+ etcetera in 0.6.

@MikeInnes
Copy link
Member Author

That's not a reasonable requirement in any case where we can't broadcast arbitrary Julia functions. As it turns out there are quite a lot of cases like that, including TensorFlow, MXNet, and many of the GPU libraries. This is a real use case and it's unfortunate that the discussions in Base didn't take it into account whatsoever.

@stevengj
Copy link

Why can't you broadcast arbitrary Julia functions?

@MikeInnes
Copy link
Member Author

Many libraries that provide an array abstraction do so in a numpy-like fashion – you get a set of "vectorised" operations like +, * etc. but anything that accesses individual elements is either breaking the abstraction barrier or unusably slow.

@stevengj
Copy link

stevengj commented May 25, 2017

If you only support a small set of operators on your data, there are plenty of binary operators to choose from that you can define. You don't have to use .+. Dot operators now carry with them an expectation of fusion and support for in-place operations like x .+= foo.(x.^2) .- 3 without temporary arrays.

In the longer term, the whole Matlab/numpy-like style, where only certain vectorized operations are fast (at the cost of lots of temporary arrays), kind of defeats the point of Julia.

@stevengj
Copy link

I also don't see how that applies to Flux and DataFlow, which are pure-Julia packages as far as I can tell.

@MikeInnes
Copy link
Member Author

MikeInnes commented May 25, 2017

In the long term, yes, I'd love to have this stuff all implemented in Julia and compile GPU code on the fly etc. But that isn't going to happen immediately, so interop with existing libraries is the only reasonable option right now.

Can you elaborate on how, say, broadcasting +, sin etc should be written over GPU arrays, if not with .+ and sin.? (Bearing in mind we want to be generic over array types.)

I expect it would be possible to implement broadcasting syntax in a trait-like way in which the container can choose whether to fuse, which would solve the problem for us.

@oxinabox
Copy link
Member

I also don't see how that applies to Flux and DataFlow, which are pure-Julia packages as far as I can tell.

Flux has a lazy dependency on TensorFlow, and/or MXNet

@stevengj
Copy link

stevengj commented May 25, 2017

TensorFlow allows you to define efficient custom operations in C++, and it's also possible in MXNet; why couldn't you do that from Julia?

Anyway, basically .+ means fusing broadcast now in Julia, so if you want something that is not a broadcast call you should use a different symbol (e.g. or ) or function name. (I'd like to update Julia that you can use e.g. or +′ as operators.)

@MikeInnes
Copy link
Member Author

It's technically possible, it's just a big project, given that we need robust GPU compilation among other things. The right solution to this is not "wait until 2025".

The thing is, I do want broadcast. The semantics are all the same, and changing the user API (especially for something so common) for an implementation detail is not reasonable.

Not being able to write generic code that works over a range of implementation strategies kind of defeats the point of Julia.

@stevengj
Copy link

stevengj commented May 25, 2017

Not having fusion for user-defined container types and operations in Julia would be a much bigger sacrifice than saying that you need to rename if you explicitly want non-fusing operations.

@MikeInnes
Copy link
Member Author

I'm not arguing we should trade one for the other, but I'm repeating myself now.

I expect it would be possible to implement broadcasting syntax in a trait-like way in which the container can choose whether to fuse, which would solve the problem for us.

@stevengj
Copy link

stevengj commented May 25, 2017

I expect it would be possible to implement broadcasting syntax in a trait-like way in which the container can choose whether to fuse, which would solve the problem for us.

Nope, because fusion happens at a syntactic level (at lowering time), before types are known.

Changing fusion to a compile-time optimization that depends on inference is a complete redesign (and would also result in semantics that depend on inference). It's something that's been tried many times in many languages and has always failed to achieve genericity for user-defined types and functions. That is a "wait until 2050" solution.

@MikeInnes
Copy link
Member Author

if should_fuse(x, y)
  broadcast((x, y) -> x + y, x, y)
else
  broadcast(+, x, y)
end

This is still a syntactical transformation that doesn't depend on inference. should_fuse can default to true and be compiled away in the base case (just like promotion rules), leaving you identical code to the current output. But overriding it for GPUArray etc would solve our problem.

@stevengj
Copy link

I see, yes, that would be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants