Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast should not drop zero-dimensional results to scalars #28866

Open
ZeppLu opened this issue Aug 24, 2018 · 35 comments
Open

broadcast should not drop zero-dimensional results to scalars #28866

ZeppLu opened this issue Aug 24, 2018 · 35 comments
Labels
arrays [a, r, r, a, y, s] broadcast Applying a function over a collection design Design of APIs or of the language itself
Milestone

Comments

@ZeppLu
Copy link
Contributor

ZeppLu commented Aug 24, 2018

julia> map(typeof, (zeros(), ones(), fill(3.14)))
(Array{Float64,0}, Array{Float64,0}, Array{Float64,0})

julia> typeof(ones() .* fill(3.14))
Float64

julia> typeof(zeros() + ones())  # in fact not only after broadcast
Float64

Is it a bug, or a feature?

@mbauman
Copy link
Member

mbauman commented Aug 24, 2018

This is an intentional feature. Zero-dimensional arrays act like scalars in broadcast.

@ZeppLu
Copy link
Contributor Author

ZeppLu commented Aug 24, 2018

OK seems like it's a feature.

But I still wonder why typeof(zeros() + ones()) == Float64, note there's no broadcast. Intuitively speaking, the sum of two arrays should always be an array as well, shouldn't it?

@mbauman
Copy link
Member

mbauman commented Aug 24, 2018

Oh my, you're right — I missed your third example there. That indeed is a bug.

@mbauman mbauman changed the title 0-dimensional array becomes scalar after broadcast Some array functions drop zero-dimensional results to scalars Aug 24, 2018
@mbauman mbauman added bug Indicates an unexpected problem or unintended behavior broadcast Applying a function over a collection arrays [a, r, r, a, y, s] labels Aug 24, 2018
@mbauman
Copy link
Member

mbauman commented Aug 24, 2018

In short: we have long depended upon broadcasting to implement a number of array functions that implicitly work elementwise. We just add additional size checks. Unfortunately we now also need to add a zero-dimensional check, too.

Just around the definition of +, I can see that this also impacts -, conj, real, and imag.

@ZeppLu
Copy link
Contributor Author

ZeppLu commented Aug 24, 2018

IMHO, we should make things consistent here, if ones() .+ ones() returns a scalar while ones() + ones() returns an array, people will get confused.

My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.

@mbauman
Copy link
Member

mbauman commented Aug 24, 2018

Yes, I'm in complete agreement. That's the bug, and it needs to be fixed.

Edit (a year later): I'm confused by my comment. I'm pretty sure I didn't mean to say what this says. I just wanted to fix the + behavior but keep broadcast the same.

@andyferris
Copy link
Member

The casting of 0D arrays to scalars at the end of broadcast has been bothering me too.

@yurivish
Copy link
Contributor

yurivish commented Jan 11, 2019

I just noticed this too while working to update the tests for https://github.com/mbauman/InvertedIndices.jl to 1.0.

Specifically,

    A = fill(1)
    @test A[Not(A.==1)] == []

fails with

Test threw exception
  Expression: A[Not(A .== 1)] == []
  ArgumentError: invalid index: true of type Bool

since fill(1) .== 1 returns true on 1.0.

mbauman added a commit that referenced this issue May 23, 2019
This is a simple workaround for the handful of elementwise operations that are defined on arrays _without_ the need for explicit broadcast but use broadcasting (with an extra shape check) in their implementation. These were the only affected cases I could find.
@bzinberg
Copy link
Contributor

bzinberg commented May 23, 2019

#28866 (comment)

My suggestion is, broadcast SHOULD treat 0d arrays as scalars, but SHOULD NOT implicitly cast them to scalars in the final result.

IIUC, what you are calling "treating 0d arrays as scalars" is not an exception, it's just an application of the usual broadcasting rules, as () is broadcastable to any shape. If 0d arrays didn't behave this way under broadcast, that would be a special-case exception to the broadcast rules. So you definitely have my +1 on that 👍

@mbauman
Copy link
Member

mbauman commented May 23, 2019

Here's the crux of the problem: What should 1 .+ 2 return? Is it 3 or is it fill(3)?

As far as broadcast is concerned, it's exactly the same as fill(1) .+ 2 and fill(1) .+ fill(2); 1 and fill(1) have the same axes.

@bzinberg
Copy link
Contributor

IMO, the following chain of equivalences should hold:

1 .+ 2 === broadcast(+, 1, 2) === broadcast(+, convert(1), convert(2)) == fill(3)

I say this because the type signature broadcast should require the arguments to be arrays, so when the user gives the shorthand of an Int, it should be WAI that the int be implicitly converted to an array.

I don't see any principled reason for 1 .+ 2 to be 3.

@mbauman
Copy link
Member

mbauman commented May 23, 2019

I don't see any principled reason for 1 .+ 2 to be 3.

I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical. I initially tried to make fill(1) .+ 2 == fill(3) (#26212) and locally even tried changing the 1 .+ 2 behavior as well but it was far more breaking than I was able to stomach. There is lots of code that intentionally allows scalar- or array-like behaviors with broadcasting. There is also lots of code that unnecessarily applies broadcasting to scalars.

This is something we could reconsider as a breaking change for 2.0, but in order for it to be feasible I think we'd also need to allow 0-dimensional arrays to participate in the linear algebra of vectors and matrices. I'm also still not convinced that it's actually all that terrible in practice, despite my own distaste for it in theory.

For more details, check out #17318.

@andyferris
Copy link
Member

What’s the relationship between broadcast and linear algebra?

@mbauman
Copy link
Member

mbauman commented May 23, 2019

It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like fill(2) * rand(3,3) would ameliorate some of the complaints I'm quite certain we'd receive.

@bzinberg
Copy link
Contributor

bzinberg commented May 23, 2019

I tend to agree; I have a strong distaste for behaviors like this on principle. The reasons were entirely practical.

I think for big software projects, it's best to have the guts of the language be meticulous about degenerate cases, and to have the convenient-but-slightly-unprincipled stuff be found only very close to the surface, so as not to muddy the ability to reason consistently about the software. I'd say in this bug we are reaching precisely the point at which the motivation for that thinking becomes apparent: because Julia handled zero-dimensional broadcasting inconsistently since the beginning, it's now much easier to make a band-aid fix to the internals that special-cases zero-dimensional arrays, than to rework the internals so that the special case is not necessary.

(This is totally a hindsight observation, and I don't think anyone necessarily did anything wrong in this regard during the development trajectory of Julia, since there were trade-offs to make at every step of the way.)

@bzinberg
Copy link
Contributor

bzinberg commented May 23, 2019

It's just that folks would see zero-dimensional arrays much more frequently; making them more capable and allowing them to do things like fill(2) * rand(3,3) would ameliorate some of the complaints I'm quite certain we'd receive.

What is the intended meaning of fill(2) * rand(3, 3)? (I understand .*, but not *)

@mbauman mbauman changed the title Some array functions drop zero-dimensional results to scalars Should broadcast drop zero-dimensional results to scalars? May 23, 2019
@mbauman
Copy link
Member

mbauman commented May 23, 2019

Yes, history and inertia is one part of the story. The other part is that it is practically quite useful and (I think) friendlier. Arithmetic on a zero-dimensional array is quite limited, but there would be ways of defining it to be a bit friendlier (like having fill(2) * A be defined to mean the same thing as 2*A).

Now, I'm much more sympathetic to cases like fill(1) .+ fill(2) or fill(1) .== 1 because there you are already starting with an Array{T,0}, and there is some good sense in preserving that. But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

@bzinberg
Copy link
Contributor

bzinberg commented May 23, 2019

(like having fill(2) * A be defined to mean the same thing as 2*A)

My concern there is that the *(a::Array, b::Array) already means matrix product, and so defining fill(2) * A to be fill(2) .* A rather than matmul(fill(2), A) (which only makes sense if size(A)[end-1] == 0) is a redefinition of an existing concept that makes it more complicated due to the added special case.

One thought that comes to mind is that perhaps we could have an infix operator for the tensor product. It seems to me that confusion of what 2 * A means (either "scale the matrix" or "a matrix product that has mismatched dimensions") comes from not realizing that the first one is a tensor product, not a matrix product.

But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

Hm. I would say that any dotted operator should always return an array. That's a consistent and easy-to-remember rule, and I think having that consistency could save headaches like this without burdening programmers much. What it might do is be slightly surprising to people who are used to math notation which often just uses juxtaposition for all the many notions of multiplication (scalar, matrix, function composition, tensor). But I think it's good that we don't carry that notational ambiguity into this language :)

And converting an Array{T, 0} to a T is easy and IMO familiar as well: just add [] to the end. I'd call it analogous to how in Python, if L = [x], I wouldn't expect L and x to behave the same way -- I would know to write L[0], and readers would understand that.

@bzinberg
Copy link
Contributor

Now, I'm much more sympathetic to cases like fill(1) .+ fill(2) or fill(1) .== 1 because there you are already starting with an Array{T,0}, and there is some good sense in preserving that. But in the case of 1 .+ 2 I'm more strongly opposed to wrapping its result in an Array{T,0}. You are starting with something that has axes and supports getindex and you end up with the same sort of thing that has axes and supports getindex. The result is also far more useful.

I would describe the situation as this: There is a type U and a function

f(w::W, x::U, y::U)::U

(in our case f = broadcast). There is an implicit conversion from Float64 to U. We then write z = f(w, 1.0, 2.0). I think we don't have any reason to expect z to have all the same behaviors as Float64 unless there is also an implicit conversion from U to Float64. (Also, Array{Float64,0} does support axes and getindex?)

@andyferris
Copy link
Member

Honestly, I never much liked that Number has container semantics (that seemed to be a purely pragmatic decision to ease certain parts of the implementation in Base but results in silly bugs where typos like for i in N proceed without error) and broadcast is an operation on containers. Basically I feel that 1 .+ 2 is a programmer error - I wouldn't let it slide on a PR for example.

In very generic code the one thing you do have to mentally track is which variables are scalar and which are containers, so I don't see any impediment to generic code if for example it were forbidden to write broadcast operations on scalars only (IMO such a choice should basically only catch logic errors or bugs). But returning a 0d array in the default case seems fairly a fairly pragmatic choice, and has logical consistency with the way scalars are treated within a larger broadcast operation.

Also note that if we support fill(2) * matrix we still probably shouldn't support something like matrix + fill(2) in the future (where I still believe matrix + 2 makes sense as matrix + 2I).

@bzinberg
Copy link
Contributor

bzinberg commented May 23, 2019

@andyferris

(where I still believe matrix + 2 makes sense as matrix + 2I)

This is specific to the rank-2 case, right?

@mbauman
Copy link
Member

mbauman commented May 28, 2019

The best way to evaluate this change would be to try it out. Personally, I don't find the status quo so abhorrent, so I won't be pushing for this change, and clearly it'd be a breaking v2.0 change, but it's a really easy change to make. Just steal the logic from #32122 — it really shouldn't be more than 10LOC between all the builtin broadcast implementations. Then the big question is how many LOC worth of tests and packages need to change. I'm up for changing my mind on the practicality here if the evidence weighs against it!

@mbauman
Copy link
Member

mbauman commented May 29, 2019

Here's a compelling argument: https://discourse.julialang.org/t/broadcasting-and-pairs-using/24739

In short:

p = "2".=>"two"
replace.(["123", "246"], p)

vs.

replace.(["123", "246"], "2".=>"two")

The first fails, whereas the second succeeds, but the two should be equivalent in their end result. This only happens because we've lost the 0-d container.

@bzinberg
Copy link
Contributor

@mbauman, I'd be interested in trying your suggestion in #28866 (comment) to see how much work it is. I'm new to contributing to Julia -- can you point me to what I need to know to get my local clone to the point where I can make a change to the broadcast implementation, run some compile/test commands, and see what breaks? I can do that quickly, then over the next few weeks work on it as I have time.

@mbauman
Copy link
Member

mbauman commented May 29, 2019

Here you go: https://github.com/JuliaLang/julia/compare/mb/true28866

Just:

git clone https://github.com/JuliaLang/julia.git
cd julia
git checkout mb/true28866
make
make testall

@mbauman
Copy link
Member

mbauman commented Jun 10, 2019

Just an administrative note: this issue started out describing both broadcast's design as well as the bug in #32122 — and I wrote the commit message there before we really started hashing out broadcast's design here. So we should re-open this issue if that commit message ends up auto-closing this — it only addresses the "easy" half of this issue.

mbauman added a commit that referenced this issue Jun 10, 2019
…tainers

This is a simple workaround for the handful of elementwise operations that are defined on arrays _without_ the need for explicit broadcast but use broadcasting (with an extra shape check) in their implementation. These were the only affected cases I could find.
StefanKarpinski pushed a commit that referenced this issue Jul 3, 2019
…tainers (#32122)

This is a simple workaround for the handful of elementwise operations that are defined on arrays _without_ the need for explicit broadcast but use broadcasting (with an extra shape check) in their implementation. These were the only affected cases I could find.
KristofferC added a commit that referenced this issue Aug 26, 2019
KristofferC added a commit that referenced this issue Aug 26, 2019
mauro3 added a commit to mauro3/WhereTheWaterFlows.jl that referenced this issue Oct 30, 2019
@cossio
Copy link
Contributor

cossio commented May 18, 2020

I find zeros() .+ 1 == 1 very unintuitive. I had an issue with this today. The following function:

f(x::AbstractArray; dims=:) = sum(x .+ 1; dims=dims)

Then f(x) works for all arrays, except for zero-dimensional arrays because of this. In this example I now have to special case dims=:.

Maybe there should be a definition for sum(::Number; dims::Colon).

@GiggleLiu
Copy link
Contributor

GiggleLiu commented Jun 1, 2022

Broadcasting should return 0-dimensional array! Support you with another using case:

julia> conj(Zygote.Fill(1.0im))
0-dimensional Array{FillArrays.Fill{ComplexF64, 0, Tuple{}}, 0}:
Fill(0.0 - 1.0im)

The conj over filled array returns a 0-dimensional array with filled array inside!

This is cause by:

  1. first broadcast over Fill, returns a Fill type.
  2. then wrap a 0-dimensional array to keep the return shape the same.

@inline function broadcast_preserving_zero_d(f, As...)

Is it really safe to treat all arrays as dense array type? The above code is very bad to ecosystem.

This now causing OMEinsum AD test failure:
under-Peter/OMEinsum.jl#141

@mbauman
Copy link
Member

mbauman commented Jun 1, 2022

Ironically, that's happening because FillArray's Fill defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.

It'd work as you want if FillArrays would match Base's current broadcasting semantics.

@GiggleLiu
Copy link
Contributor

GiggleLiu commented Jun 1, 2022

Ironically, that's happening because FillArray's Fill defines its own broadcast implementation that preserves the 0-dimensional array as requested here. I agree this is how things should behave, but changing this quite breaking (as you can see) and will likely have to wait for v2.0.

It'd work as you want if FillArrays would match Base's current broadcasting semantics.

It is not only the problem of the FillArrays. This implementation in Julia base (

@inline function broadcast_preserving_zero_d(f, As...)
) is bad:

length(axes(bc)) == 0 ? fill!(similar(bc, typeof(r)), r) : r

Two aspects can be improved

  1. it should check the type of the original array and try to return the same array type,
  2. it should be type stable.

I won't be happy if conj(::Fill) returns me a normal array.
Maybe fallback conj, real function on the broadcast method is a wrong decision, because broadcast is not bijective in type (different array types can go to the same scalar type).

@mbauman
Copy link
Member

mbauman commented Jun 1, 2022

  1. it should check the type of the original array and try to return the same array type,

It tries to; that's what the similar(bc, typeof(r)) part does. It's up to the broadcast implementation to specialize that.

  1. it should be type stable.

It is not a type instability because the length of the axes is encoded into the broadcast's type.

@mbauman mbauman changed the title Should broadcast drop zero-dimensional results to scalars? broadcast should not drop zero-dimensional results to scalars Jun 1, 2022
@GiggleLiu
Copy link
Contributor

  1. it should check the type of the original array and try to return the same array type,

It tries to; that's what the similar(bc, typeof(r)) part does. It's up to the broadcast implementation to specialize that.

  1. it should be type stable.

It is not a type instability because the length of the axes is encoded into the broadcast's type.

I see what you mean. So it is a wield interaction of similar does not preserve type and broadcast does not keep array type information for 0-dimensional array. Now I agree there is no easy solution. Hope Julia v2.0 can make the broadcast more predictable.

@kellertuer
Copy link

Hi,
I just sumbled upon this as well and just to add to the irritation here, I wrote a small discourse about my encounter https://discourse.julialang.org/t/broadcast-strange-on-0-dimensional-array/97311 - which I will resolve by not using 0-dim arrays, since for me they currently seem to not behave as I expect arrays to behave (especially in the .+ case.

@sadish-d
Copy link

I came to realize the broadcasting behavior on 0-dimensional arrays pretty late. This will mean a significant implementation change for what I'm working on. A minimal example of the issue I'm running into:

f(x::Array{<:Real}) = x .+ 1
g(y::Array{<:Real}) = nothing

a = [1]
b = fill(1)

(g∘f)(a) # works
(g∘f)(b) # breaks

@MilesCranmer
Copy link
Member

MilesCranmer commented Jun 27, 2024

This is especially problematic for GPU code because scalarization strips any allocation information present in a zero-dimensional array:

julia> x = MtlArray(fill(1f0))
0-dimensional MtlArray{Float32, 0, Private}:
1.0

julia> x .+ x
2.0f0

which now lives on the CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] broadcast Applying a function over a collection design Design of APIs or of the language itself
Projects
None yet
Development

No branches or pull requests

10 participants