Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make StructArrays broadcast aware #136

Merged
merged 4 commits into from
Jul 14, 2020
Merged

Make StructArrays broadcast aware #136

merged 4 commits into from
Jul 14, 2020

Conversation

timholy
Copy link
Member

@timholy timholy commented Jul 11, 2020

Continues #90. This is more conservative, returning a StructArray only if none of the other participating arrays (including the ones wrapped in the StructArray) have specialized broadcast behavior. One could add binary BroadcastStyle rules to control behavior in other cases, but it seems best to wait until there's a real-world use case.

Keno and others added 2 commits July 11, 2020 04:32
While allowing broadcasting to return a StructArray, this limits it to
cases where:

- no other arrays in the broadcast operation, including those wrapped by the
  StructArray, have non-default BroadcastStyle
- the eltype returned from the function is a struct type

It should be straightforward to define precedence rules to handle other
cases, e.g., StructArrays of CuArrays.
@timholy
Copy link
Member Author

timholy commented Jul 11, 2020

CC @Keno

@piever
Copy link
Collaborator

piever commented Jul 12, 2020

Thank you!

Does this address #90 (comment)? With the previous PR there were some concerns in case the individual arrays composing the StructArray were not Base.Array but rather some custom type. I guess it would be good to add some test for that? I'm mainly thinking about CUDA.CuArray, but of course it'd be better to test with something that does not require a GPU. Maybe OffsetArrays? We already have them as a test dependency.

From the point of view of semver, is it breaking to change the return type (but not the content) of broadcast?

@timholy
Copy link
Member Author

timholy commented Jul 12, 2020

I added another test, plus a fallback that seems like a reasonable default for most customizations.

@timholy
Copy link
Member Author

timholy commented Jul 12, 2020

From the point of view of semver, is it breaking to change the return type (but not the content) of broadcast?

I never know what to say about that...I would say, safer is better than sorry, and make this 0.5 (or 1.0 if you think most other things are solid).

@piever
Copy link
Collaborator

piever commented Jul 12, 2020

That's not quite the issue that I had in mind. What I'm thinking is that it could be useful, if you are working on the GPU with say an array of complex numbers, to store it as a StructArray with two CuArrays as components.

Given that the CPU case behaves like this (with this PR):

julia> using StructArrays

julia> c = rand(Float32, 100);

julia> s = StructArray{ComplexF32}((c, c));

julia> s .+ s
100-element StructArray(::Array{Float32,1}, ::Array{Float32,1}) with eltype Complex{Float32}:

I imagine that the CUDA case should behave in a similar way:

julia> using StructArrays, CUDA

julia> c = CUDA.rand(100);

julia> s = StructArray{ComplexF32}((c, c));

julia> s .+ s
100-element StructArray(::CuArray{Float32,1,Nothing}, ::CuArray{Float32,1,Nothing}) with eltype Complex{Float32}:

Instead, with this PR s .+ s actually throws an error.:

julia> s .+ s
ERROR: MethodError: no method matching similar(::Base.Broadcast.Broadcasted{StructArrays.StructArrayStyle{CUDA.CuArrayStyle{1}},Tuple{Base.OneTo{Int64}},typeof(+),Tuple{StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64},StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64}}}, ::Type{Complex{Float32}}, ::Tuple{Base.OneTo{Int64}})
Closest candidates are:
  similar(::AbstractArray, ::Type{T}, ::Tuple{Union{Integer, Base.OneTo},Vararg{Union{Integer, Base.OneTo},N} where N}) where T at abstractarray.jl:634
  similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{N},Axes,F,Args} where Args<:Tuple where F where Axes, ::Type{ElType}, ::Any) where {N, ElType} at broadcast.jl:197
  similar(::Base.Broadcast.Broadcasted{Base.Broadcast.ArrayConflict,Axes,F,Args} where Args<:Tuple where F where Axes, ::Type{ElType}, ::Any) where ElType at broadcast.jl:202
  ...
Stacktrace:
 [1] similar(::Base.Broadcast.Broadcasted{StructArrays.StructArrayStyle{CUDA.CuArrayStyle{1}},Tuple{Base.OneTo{Int64}},typeof(+),Tuple{StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64},StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64}}}, ::Type{Complex{Float32}}) at ./broadcast.jl:196
 [2] copy at ./broadcast.jl:862 [inlined]
 [3] materialize(::Base.Broadcast.Broadcasted{StructArrays.StructArrayStyle{CUDA.CuArrayStyle{1}},Nothing,typeof(+),Tuple{StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64},StructArray{Complex{Float32},1,NamedTuple{(:re, :im),Tuple{CuArray{Float32,1,Nothing},CuArray{Float32,1,Nothing}}},Int64}}}) at ./broadcast.jl:837
 [4] top-level scope at REPL[17]:1

I was wondering if there was a smart way to get it to do the right thing, but I guess that's a bit complex in general.

The case where the columns are OffsetArrays in my view should be analogous: the result should be a StructArray whose columns are OffsetArrays.

Pinging @vchuravy, as he originally pointed out the issue.

@timholy
Copy link
Member Author

timholy commented Jul 12, 2020

The more I think about it, the less I'm sure we should try to come up with good fallbacks. So I just pushed another commit that doubles down on that MethodError. In many ways a MethodError is good: the error message tells you exactly what method is needed, and at that point it's up to you to decide what result type that method should produce. In contrast, something that silently succeeds (but returns the wrong type) can be confusing. But you have to decide if that brittleness is worth it. If you don't like the brittleness, you probably want the first two commits, but you won't get a StructArray output.

You mention the OffsetArrays package, but it has no custom broadcasting. Everything gets communicated by the type of the axes.

It's somewhere between challenging and impossible to solve correctly in general, given the current interface and the technology available in this package. Most of your similar methods operate in the value-domain, and you'd need to be able to construct the right object in the value domain using only type information. That is, given the type StructArrayStyle{CUDA.CuArrayStyle{1}}, the eltype Complex{Float32}, and the axes, you'd have to know to create a StructArray(::CuArray{Float32}(undef, sz), :::CuArray{Float32}(undef, sz)). But there's no obvious link between CuArrayStyle and CuArray without going through the pain of constructing some kind of fake Broadcasted{CuArrayStyle} object (which I don't recommend). And in some cases there's no automatic behavior that will work in all cases (how should broadcasting between a StructArray and ArrayAndChar work?), so often it's best to throw the error and make someone decide.

To fix the specific StructArray{<:CuArray} problem, someone should either add the method to a package that loads both StructArrays and CuArrays, or add it to one or the other via @require. It would look like this:

Base.similar(bc::Broadcasted{StructArrayStyle{S}}, ::Type{ElType}) where {S<:CuArrayStyle,N,ElType} =
    isstructtype(ElType) ? _structarray(CuArray, ElType, axes(bc)) : similar(CuArray{ElType}, axes(bc))

However, unless I'm missing something that _structarray method does not yet exist:

_structarray(::Type{ArrayConstructor}, ::Type{ElType}, axs) where {ArrayConstructor, ElType} = ?

It should build a StructArray using ArrayConstructor to create the component arrays. For the use cases I care about I don't need that method, so I'm not planning on adding it as part of this PR.

@timholy
Copy link
Member Author

timholy commented Jul 12, 2020

Let me know whether you want just the first, the first two, or all three of the commits here. With commit 1 or 1-3 it's deliberately brittle. With commits 1-2, it has "sensible" fallbacks but these do the wrong thing in the case of the StructArray{<:CuArray}. 1-3 is basically the same thing as just 1, but with a test for the MethodError.

@piever
Copy link
Collaborator

piever commented Jul 12, 2020

Ah, I'm starting to understand - most of this broadcast machinery is above my pay grade :)

Then, I would agree that it's better to not have the fallback and figure out in the future where / how add the code to make the GPU case work. That is, I'm happy to merge all three commits, possibly squashing them into one when merging.

I think the _structarray method you mention - given a custom "column initializer" and eltype, give me a StructArray - actually exists (it should be StructArrayInitializer, which uses buildfromschema). So, if I understand correctly, what we need to add this feature with generality (eventually, this can definitely be figured out later) would be for the Base.Broadcast module to somehow give a way to associate a "default constructor" to a specific BroadcastStyle, but for now that has to be done on a case by case basis.

I have a final concern, but I guess it could be OK in practice. For some structs, getting them into a StructArray does not quite work, even if they are isstructtype. I suspect before releasing the new broadcast (as far as merging goes, I'm happy as is), these should somehow be fixed (or maybe we are happy to let these corner cases error, it's probably not so bad). There are two reasons that could make a given struct type fail.

Problem 1: they use inner constructors.

This way, StructArrays has no way to guess how to build the instance from the type and the field values (that is to say, maybe there is a way, but I don't know it).

julia> c = rand(100);

julia> s = StructArray{ComplexF64}((c, c));

julia> struct B
           x::Float64
           y::Float64
           B(x) = new(x, x)
       end

julia> broadcast(z -> B(abs(z)), s) # here we fail on `getindex` trying to create instances
100-element StructArray(::Array{Float64,1}, ::Array{Float64,1}) with eltype B:
Error showing value of type StructArray{B,1,NamedTuple{(:x, :y),Tuple{Array{Float64,1},Array{Float64,1}}},Int64}:
ERROR: MethodError: no method matching B(::Float64, ::Float64)

Would you know, given a struct type T (with potentially only inner constructors) and the list of all the fields of an instance (like x and y above), if there is a completely reliable way to get back the instance? For now I do T(args...) but that would fail above (and I have to special case tuples and named tuples).

Problem 2: they customize getproperty

This was a design decision a while ago, possibly a bad one. Calling s[i] = val tries to fill in each field based on getproperty(val, name::Symbol) (where name varies among the colnames of s). This is done in this utility that is used throughout. So, if some structs customize getproperty but they do not customize staticschema (which is the function to determine the layout of the StructArrays), then they will complain if you try to make a StructArray out of them.

This is the one problem that I think is easiest to fix, because instead of getproperty here one should use some other function defined in StructArrays that defaults to getfield, and for users that want to customize that, they would overload that function. There are some packages that rely on this (like GeometryBasics), but if the fix is just having to overload one more thing, that's not so bad. I would take advantage of the breaking change here to introduce that other breaking change.

@timholy
Copy link
Member Author

timholy commented Jul 13, 2020

I think the _structarray method you mention - given a custom "column initializer" and eltype, give me a StructArray - actually exists (it should be StructArrayInitializer, which uses buildfromschema).

Oh, good. Then this is very close to feasible, and a one-line addition to some CUDA package would solve @vchuravy's issue.

For the other problems you mention...without digging into detail I'm not certain I know what it is you need, but it seems likely that bypassing the constructors is the way to go. You can see an example in the Serialization stdlib:
https://github.com/JuliaLang/julia/blob/e24e2f091ad0f9f30ad85bf448130325a3f5c060/stdlib/Serialization/src/Serialization.jl#L1341-L1372
You can bypass getproperty with getfield(s, :fieldname).

@piever
Copy link
Collaborator

piever commented Jul 14, 2020

Ok, I've added #137 and #138 to the 0.5 milestone to make sure that by the time this gets released, building a StructArray should work on any struct type by default.

Thank you again for this, this is a very nice feature!

@piever piever merged commit 0c4adf6 into master Jul 14, 2020
@timholy timholy deleted the teh/broadcast branch July 14, 2020 21:25
@piever
Copy link
Collaborator

piever commented Feb 6, 2021

I'm about to tag a release with this new feature. The news entry links to this PR, so I'll just comment here that to fix the method error for a specific array type, one should do things like:

julia> using Base.Broadcast: ArrayStyle, Broadcasted

julia> using StructArrays: StructArrayStyle

julia> Base.similar(bc::Broadcasted{StructArrayStyle{S}}, ::Type{ElType}) where {S<:ArrayStyle{MyArray},N,ElType} =
           similar(MyArray{ElType}, axes(bc))

where of course similar(MyArray{ElType}, axes(bc)) should be replaced with the correct fallback. (This is just your "in-between" commit, but I think it has more visibility as a comment.)

I have a fuzzy intuition that there could be a general solution based on Adapt.jl (which is the common CUDA / StructArrays dependency), possibly by creating new "stubs" there that custom array packages (with custom broadcast) could extend. As adding fallbacks is non-breaking, we can do that after the release (when we have more practical usecases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants