Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more generic broadcast over a single sparse matrix #19239

Merged
merged 1 commit into from
Nov 21, 2016

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Nov 6, 2016

In #18590, one proposal for a more generic broadcast over sparse matrices received broad support: At runtime, (1) determine whether the broadcast operation returns zero when all (relevant) arguments are zero (by sampling f(0...)), (2) branch to code specialized for the case at hand, and (3) in either case return a sparse matrix for type stability.

This pull request begins implementing that proposal. Specifically, this pull request starts with broadcast over a single sparse matrix (and also handles a few common combinations of a single sparse matrix with broadcast scalars).

A few code (and corresponding PR) comments highlight some decisions. The primary decision concerns #18309: When the broadcast operation maps a (stored) nonzero entry to zero, should the result store that zero (nz2nz behavior) or not (nz2z behavior)? I now lean towards nz2z behavior for the following reason:

Consider C = A .* B (and similar operations) where A and B are sparse. Chances are you want C to the store the intersection (rather than union) of the structures of A and B (the present behavior of .* and friends). Returning the structural intersection rather than union in such cases requires the nz2z behavior when broadcasting over two or more sparse matrices (assuming you determine the broadcast operation's zero-preserving behavior solely via sampling, see #18590). Having broadcast over a single sparse matrix behave differently seems like a startling inconsistency. That inconsistency seems more important than the performance and implementation complexity differences mentioned in #18309. Regarding the zero-purging consistency point in #18309, with more thought this part of broadcast's behavior isn't really zero-purging in the sense discussed elsewhere (not so sure about broadcast! though).

Thoughts? Best!

# Bcolptr = similar(A.colptr, A.n + 1)
# copy!(Bcolptr, 1, A.colptr, 1, A.n + 1)
# Browval = similar(A.rowval, nnz(A))
# copy!(Browval, 1, A.rowval, 1, nnz(A))
Copy link
Member Author

@Sacha0 Sacha0 Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These alternative lines for constructing Bcolptr and Browval might be safer than the copy/ resize! lines above: If A.colptr and/or A.rowval are shorter than their expected lengths, the copy!s might catch the error, whereas the former copy/resize! combinations won't. The copy/resize! lines are simpler/cleaner though. Thoughts? Thanks!

# NOTE: The following two method definitions work around #19096. These definitions should
# be folded into the two preceding definitions on resolution of #19096.
broadcast{Tf,T}(f::Tf, ::Type{T}, A::SparseMatrixCSC) = broadcast(y -> f(T, y), A)
broadcast{Tf,T}(f::Tf, A::SparseMatrixCSC, ::Type{T}) = broadcast(x -> f(x, T), A)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The four method definitions above for handling common combinations of sparse matrices with broadcast scalars is not satisfying: A better solution would achieve the same for (1) all broadcast scalar types (Base.Broadcast.containertype(x) == Any?) and (2) any combination (number, order, type mixture) of broadcast scalars. One might achieve (2) via a generated function. But how one might simultaneously and gracefully achieve (1) eludes me. Thoughts much appreciated. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a generated function. Maybe we can brainstorm with @pabloferz about this at some point soon? Maybe start a discourse thread? @pabloferz, are you on discourse.julialang.org yet?

In any case, it seems like that should be separate from this PR, which is only about the single-argument case.

Copy link
Contributor

@pabloferz pabloferz Nov 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Al least for this case you might want to add another trait layer. E.g.

import Base.Broadcast.containertype
broadcast(f, A, B::SparseMatrixCSC) = broadcast_sp(f, containertype(A), containertype(B), A, B)
broadcast(f, A::SparseMatrixCSC, B) = broadcast_sp(f, containertype(A), containertype(B), A, B)
broadcast_sp(f, ::Type{Any}, ::Type{Array} A, B) = broadcast(x -> f(A, x), B)
broadcast_sp(f, ::Type{Array}, ::Type{Any} A, B) = broadcast(x -> f(x, B), A)
broadcast_sp(f, ::Type, ::Type, A, B) = sparse(Base.Broadcast.broadcast_c(f, Array, A, B))

@stevengj I'm already on discourse, so feel free to ping me (@pabloferz) if you'd like to start a thread overe there.

Copy link
Member Author

@Sacha0 Sacha0 Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following snippet works for arbitrary combinations of broadcast scalars with a single sparse matrix, but the (y...) = f(a, y...) kills inference.

import Base.Broadcast: broadcast_c, containertype, promote_containertype

containertype{T<:SparseMatrixCSC}(::Type{T}) = SparseMatrixCSC
# Combinations of sparse arrays with broadcast scalars should yield sparse arrays
promote_containertype(::Type{Any}, ::Type{SparseMatrixCSC}) = SparseMatrixCSC
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Any}) = SparseMatrixCSC
# Combinations of sparse arrays with anything else should fall back to dense array methods
promote_containertype(::Type{Array}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Array}) = Array
promote_containertype(::Type{Tuple}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Tuple}) = Array

broadcast_c(f, ::Type{SparseMatrixCSC}, A) = broadcast(f, A)
broadcast_c(f, ::Type{SparseMatrixCSC}, A, Bs...) = broadcast_sp(f, containertype(A), containertype(Bs...), A, Bs...)
broadcast_sp(f, ::Type{Any}, ::Type{SparseMatrixCSC}, a, Bs...) = broadcast_c((y...) -> f(a, y...), SparseMatrixCSC, Bs...)
broadcast_sp(f, ::Type{SparseMatrixCSC}, ::Type{Any}, A, bs...) = broadcast_c(x -> f(x, bs...), SparseMatrixCSC, A)
broadcast_sp(f, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, Bs...) = broadcast_c(f, Array, A, Bs...)

Copy link
Contributor

@pabloferz pabloferz Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible solution would be to have (additionally to what you have above)

EDIT: There was a typo. This works with one SparseMatrixCSC and up to ten scalars but won't be inferable with two or more SparseMatrixCSCs.

EDIT 2: Added a method, and it appears to be inferable with up to eight arguments (I'd say that should be enough).

immutable FArgs{F,T}
    f::F
    args::T
end
(f::FArgs)(x...) = f.f(f.args..., x...)

broadcast_sp(f, ::Type{Any}, ::Type{SparseMatrixCSC}, a, Bs...) = broadcast_c(FArgs(f,(a,)), SparseMatrixCSC, Bs...)
broadcast_sp(f::FArgs, ::Type{Any}, ::Type{SparseMatrixCSC}, a, Bs...) = broadcast_c(FArgs(f.f,(f.args...,a)), SparseMatrixCSC, Bs...)
broadcast_sp(f::FArgs, ::Type{SparseMatrixCSC}, ::Type{Any}, A, bs...) = broadcast_c(x -> f.f(f.args..., x, bs...), SparseMatrixCSC, A)
broadcast_sp(f::FArgs, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, Bs...) = broadcast_c(f.f, Array, f.args..., A, Bs...)

Copy link
Member Author

@Sacha0 Sacha0 Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hacky snippet below takes a different tack. In principle it should handle any combination of broadcast scalars and SparseMatrixCSCs; in practice type inference fails when (1) the number of arguments exceeds foursix or (2) more than one SparseMatrixCSC is involved. (edited:)

import Base: tail, front
import Base.Broadcast: broadcast_c, containertype, promote_containertype, broadcast_indices

broadcast_indices(::Type{SparseMatrixCSC}, A) = indices(A)
containertype{T<:SparseMatrixCSC}(::Type{T}) = SparseMatrixCSC
# Combinations of sparse arrays with broadcast scalars should yield sparse arrays
promote_containertype(::Type{Any}, ::Type{SparseMatrixCSC}) = SparseMatrixCSC
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Any}) = SparseMatrixCSC
# Combinations of sparse arrays with anything else should fall back to dense array methods
promote_containertype(::Type{Array}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Array}) = Array
promote_containertype(::Type{Tuple}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Tuple}) = Array

broadcast_c(f, ::Type{SparseMatrixCSC}, As...) = foo((x, y) -> f(y...), (), As...)

foo(f, arrays, nextarg, remargs...) = foo((x, y) -> f(x, (nextarg, y...)), arrays, remargs...)
foo(f, arrays, nextarg::SparseMatrixCSC, remargs...) =
    foo((x, y) -> f(front(x), (last(x), y...)), (arrays..., nextarg), remargs...)

foo(f, arrays, nextarg) = bar(x -> f(x, (nextarg,)), arrays...)
foo(f, arrays, nextarg::SparseMatrixCSC) = bar(x -> f(front(x), (last(x),)), arrays..., nextarg)

bar(f, A) = broadcast(x -> f((x,)), A)
bar(f, A, B) = broadcast_c((x,y) -> f((x,y)), Array, A, B)
bar(f, A, B, Cs...) = broadcast_c((x,y,zs...) -> f((x,y,zs...)), Array, A, B, Cs...)

I'm sure a better approach exists, possibly e.g. inverting the call chain. Having a look at @pabloferz's suggestion now. Best!

Copy link
Member Author

@Sacha0 Sacha0 Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are two evolutions of @pabloferz's suggestion. The first variation retains the helper type, which distills to a stack for leading scalar arguments. The second variation nixes the helper type and instead uses a tuple for the stack.

Both variations reduce the mutual recursion between broadcast_c and broadcast_sp to recursion simply in broadcast_sp, and punt on argument lists involving more than one SparseMatrixCSC.

Manually extending this approach to handle argument lists involving two (or some larger small integer N) SparseMatrixCSCs seems straightforward, but incurs rapid code growth with N. How to extend this approach to an arbitrary number of SparseMatrixCSCs isn't clear to me (see followup post). But perhaps N = 2 or N = 3 would be good enough.

I will play with the N = 2 case of this approach next, and then attempt to improve (inference of) the more general approach I posted above.

Code common to both variations

import Base: tail
import Base.Broadcast: broadcast_c, containertype, promote_containertype, broadcast_indices

broadcast_indices(::Type{SparseMatrixCSC}, A) = indices(A)
containertype{T<:SparseMatrixCSC}(::Type{T}) = SparseMatrixCSC
# Combinations of sparse arrays with broadcast scalars should yield sparse arrays
promote_containertype(::Type{Any}, ::Type{SparseMatrixCSC}) = SparseMatrixCSC
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Any}) = SparseMatrixCSC
# Combinations of sparse arrays with anything else should fall back to dense array methods
promote_containertype(::Type{Array}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Array}) = Array
promote_containertype(::Type{Tuple}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Tuple}) = Array

Variation retaining helper type

immutable FuncArgs{F,T}; func::F; args::T; end

# Entrypoint to broadcast_sp
broadcast_c(f, ::Type{SparseMatrixCSC}, A, Bs...) =
    broadcast_sp(FuncArgs(f, ()), containertype(A), containertype(Bs...), A, Bs...)
# Endpoints for argument lists containing a single SparseMatrixCSC
broadcast_sp(fa, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B) = broadcast(x -> fa.func(fa.args..., a, x), B)
broadcast_sp(fa, ::Type{SparseMatrixCSC}, ::Type{Any}, A, bs...) = broadcast(x -> fa.func(fa.args..., x, bs...), A)
# Endpoint for argument lists containing more than one SparseMatrixCSC (punt)
broadcast_sp(fa, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, Bs...) = broadcast_c(fa.func, Array, fa.args..., A, Bs...)
# Case requiring further reduction
broadcast_sp(fa, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B, Cs...) =
    broadcast_sp(FuncArgs(fa.func, (fa.args..., a)), containertype(B), containertype(Cs...), B, Cs...)

Variation nixing helper type

# Entrypoint to broadcast_sp
broadcast_c(f, ::Type{SparseMatrixCSC}, A, Bs...) =
    broadcast_sp(f, (), containertype(A), containertype(Bs...), A, Bs...)
# Endpoints for argument lists containing a single SparseMatrixCSC
broadcast_sp(f, as, ::Type{Any}, ::Type{SparseMatrixCSC}, b, C) = broadcast(x -> f(as..., b, x), C)
broadcast_sp(f, as, ::Type{SparseMatrixCSC}, ::Type{Any}, B, cs...) = broadcast(x -> f(as..., x, cs...), B)
# Endpoint for argument lists containing more than one SparseMatrixCSC (punt)
broadcast_sp(f, as, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, B, Cs...) = broadcast_c(f, Array, as..., B, Cs...)
# Case requiring further reduction
broadcast_sp(f, as, ::Type{Any}, ::Type{SparseMatrixCSC}, b, C, Ds...) =
    broadcast_sp(f, (as..., b), containertype(C), containertype(Ds...), C, Ds...)

Best!

Copy link
Member Author

@Sacha0 Sacha0 Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New versions capable (in principle) of handling any number of SparseMatrixCSCs follow below. None circumvent #19096. TL;DR The first version (A) is the best so far, perhaps good enough to incorporate for now? Thoughts? Any and all pointers much appreciated, particularly towards better understanding of why type inference works/fails where it does.

Code common to all versions:

import Base: front, tail
import Base.Broadcast: broadcast_c, containertype, promote_containertype, broadcast_indices

# Broadcast container type promotion for SparseMatrixCSCs
containertype{T<:SparseMatrixCSC}(::Type{T}) = SparseMatrixCSC
broadcast_indices(::Type{SparseMatrixCSC}, A) = indices(A)
# Combinations of sparse arrays with broadcast scalars should yield sparse arrays
promote_containertype(::Type{Any}, ::Type{SparseMatrixCSC}) = SparseMatrixCSC
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Any}) = SparseMatrixCSC
# Combinations of sparse arrays with anything else should fall back to dense array
promote_containertype(::Type{Array}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Array}) = Array
promote_containertype(::Type{Tuple}, ::Type{SparseMatrixCSC}) = Array
promote_containertype(::Type{SparseMatrixCSC}, ::Type{Tuple}) = Array

(A) For the following version, type inference remains happy when the argument list (1) contains a single sparse matrix and up to six broadcast scalars, (2) contains two sparse matrices and up to twelve broadcast scalars (more depending on position, strangely), or (3) up to six sparse matrices as long as no broadcast scalars appear. This behavior is a mystery to me, insight much appreciated. This version seems the best in practice at the moment.

# closeoverscalars steps through the argument list. For each scalar argument,
# closeoverscalars generates a closure incorporating that scalar into the present version
# of the broadcast function, removes that scalar from the argument list, and steps to the
# next argument. For each matrix argument, closeoverscalars generates a closure which
# will later funnel that matrix argument to the right slot in the original broadcast
# function, collects that matrix in a tuple of matrices over which to eventually broadcast,
# removes that matrix from the argument list, and steps to the next argument. After
# traversing the argument list, closeoverscalars returns a closure incorporating
# all broadcast scalars in the argument list followed by the argument list's sparse matrices.
#
# Entrypoint to closeoverscalars
broadcast_c(f, ::Type{SparseMatrixCSC}, args...) = broadcast(closeoverscalars(f, args)...)
closeoverscalars(f, args) = closeoverscalars((x, y) -> f(y...), (), args...)
# Recursion cases for closeoverscalars
closescalar(f, s) = (x, y) -> f(x, (s, y...))
passmatrix(f) = (x, y) -> f(front(x), (last(x), y...))
closeoverscalars(f, mats, nextarg, remargs...) = closeoverscalars(closescalar(f, nextarg), mats, remargs...)
closeoverscalars(f, mats, nextarg::SparseMatrixCSC, remargs...) = closeoverscalars(passmatrix(f), (mats..., nextarg), remargs...)
# Base cases for closescalars
closelastscalar(f, s) = (x...) -> f(x, (s,))
passlastmatrix(f) = (x...) -> f(front(x), (last(x),))
closeoverscalars(f, mats, lastarg) = (closelastscalar(f, lastarg), mats...)
closeoverscalars(f, mats, lastarg::SparseMatrixCSC) = (passlastmatrix(f), mats..., lastarg)

Edit: Partial evaluation not correct. (B) For the following version (a descendant of @pabloferz's suggestion), type inference remains happy when the argument list (1) contains up to nine members including either one or two sparse matrices, or (2) contains three sparse matrices and no broadcast scalars. This behavior is also a mystery to me, insight much appreciated.

# broadcast_sp steps through the argument list, collecting each sequence of broadcast scalars
# between one sparse matrix (or start or end of the argument list) and the next in a tuple,
# storing the sequence of such tuples in stups ("scalar-tuples"), and storing the sparse
# matrices in mats. After traversing the argument list, broadcast_sp calls closeoverscalars
# which, given stups, mats, and the broadcast function, generates a closure incorporating
# all broadcast scalars in the argument list. broadcast_sp then passes that closure along
# with the argument list's sparse matrices on to broadcast.
#
# Entrypoint to broadcast_sp
broadcast_c(fn, ::Type{SparseMatrixCSC}, A, Bs...) =
    broadcast_sp(fn, ((),), (), containertype(A), containertype(Bs...), A, Bs...)
# Recursion cases for broadcast_sp
broadcast_sp(fn, stups, mats, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B, Cs...) =
    broadcast_sp(fn, (front(stups)..., (last(stups)..., a)), mats, containertype(B), containertype(Cs...), B, Cs...)
broadcast_sp(fn, stups, mats, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, B, Cs...) =
    broadcast_sp(fn, (stups..., ()), (mats..., A), containertype(B), containertype(Cs...), B, Cs...)
# Base cases for broadcast_sp
broadcast_sp(fn, stups, mats, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, B) =
    broadcast(closeoverscalars((x...) -> fn(front(front(x))..., last(stups)..., last(front(x)), last(x)), front(stups)), mats..., A, B)
broadcast_sp(fn, stups, mats, ::Type{SparseMatrixCSC}, ::Type{Any}, A, bs...) =
    broadcast(closeoverscalars((x...) -> fn(front(x)..., last(stups)..., last(x), bs...), front(stups)), mats..., A)
broadcast_sp(fn, stups, mats, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B) =
    broadcast(closeoverscalars((x...) -> fn(front(x)..., last(stups)..., a, last(x)), front(stups)), mats..., B)
# Recursively close over scalars
closeoverscalars(fn, stups) = closeoverscalars((x...) -> fn(front(x)..., last(stups)..., last(x)), front(stups))
closeoverscalars(fn, ::Tuple{}) = fn

Edit: Partial evaluation not correct. (C) The following version is a simpler realization of the preceding concept, but type inference does not seem to like it (nor a similar version using varags for groupargs).

# Entrypoint
broadcast_c(f, ::Type{SparseMatrixCSC}, args...) =
    (ga = groupargs(args); broadcast(closeoverscalars(f, ga[1]), ga[2]...))
# Entrypoint for groupargs
groupargs(args) = groupargs(((),), (), first(args), tail(args))
# Recursion cases for groupargs
groupargs(stups, mats, nextarg, remargs) =
    groupargs((front(stups)..., (last(stups)..., nextarg)), mats, first(remargs), tail(remargs))
groupargs(stups, mats, nextarg::SparseMatrixCSC, remargs) =
    groupargs((stups..., ()), (mats..., nextarg), first(remargs), tail(remargs))
# Base cases for groupargs
groupargs(stups, mats, lastarg, ::Tuple{}) = ((front(stups)..., (last(stups)..., lastarg)), mats)
groupargs(stups, mats, lastarg::SparseMatrixCSC, ::Tuple{}) = ((stups..., ()), (mats..., lastarg))
# Entrypoint and recursion and base cases for closeoverscalars
closeoverscalars(f, stups) = _closeoverscalars((x...) -> f(x..., last(stups)...), front(stups))
_closeoverscalars(f, stups) = _closeoverscalars((x...) -> f(front(x)..., last(stups)..., last(x)), front(stups))
_closeoverscalars(f, ::Tuple{}) = f

Best!

Copy link
Contributor

@pabloferz pabloferz Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sacha0 Inspired on your approach above that evolved from my suggestion I was able to write it in such a way that it is inferable up to eight arguments irrespectively of the arguments types. The maximum number of arguments, so that it keeps inferable, I believe, is a limitation of the inference mechanism, but I haven't looked this into detail.

The approach would be your 'common code' above plus

# Entrypoint to broadcast_sp
broadcast_c(f, ::Type{SparseMatrixCSC}, A, Bs...) =
    broadcast_sp(f, (), nothing, (), containertype(A), containertype(Bs...), A, Bs...)
# Recursion cases for broadcast_sp
broadcast_sp(f, ftup, ::Void, ttup, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B, Cs...) =
    broadcast_sp(f, (ftup..., a), nothing, ttup, containertype(B), containertype(Cs...), B, Cs...)
broadcast_sp(f, ftup, ::Void, ttup, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, B, Cs...) =
    broadcast_sp(f, ftup, A, ttup, containertype(B), containertype(Cs...), B, Cs...)
broadcast_sp(f, ftup, A, ttup, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B, Cs...) =
    broadcast_sp(f, ftup, A, (ttup..., a), containertype(B), containertype(Cs...), B, Cs...)
# Base cases for broadcast_sp
broadcast_sp(f, ftup, ::Void, ttup, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B) =
    broadcast(x -> f(ftup..., a, x), B)
broadcast_sp(f, ftup, ::Void, ttup, ::Type{SparseMatrixCSC}, ::Type{Any}, A, bs...) =
    broadcast(x -> f(ftup..., x, bs...), A)
broadcast_sp(f, ftup, ::Void, ttup, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, A, B) =
    broadcast((x, y) -> f(ftup..., x, y), A, B)
broadcast_sp(f, ftup, A, ttup, ::Type{Any}, ::Type{SparseMatrixCSC}, a, B) =
    broadcast((x, y) -> f(ftup..., x, ttup..., a, y), A, B)
broadcast_sp(f, ftup, A, ttup, ::Type{SparseMatrixCSC}, ::Type{Any}, B, cs...) =
    broadcast((x, y) -> f(ftup..., x, ttup..., y, cs...), A, B)
broadcast_sp(f, ftup, A, ttup, ::Type{SparseMatrixCSC}, ::Type{SparseMatrixCSC}, B, Cs...) =
    broadcast_c(f, Array, ftup..., A, ttup..., B, Cs...)

NOTE: When using the dot syntax, given that the parser can already reduce the effective number of arguments passed to broadcast the number of arguments can be even more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion continued on discourse. Best!

# behavior presently above (this would be consistent with existing binary sparse broadcast
# behavior); or (2) the generic unary broadcast behavior should be nz2nz_z2z as above, and the
# nz2z_z2z specializations below should ultimately go away in favor of `broadcast`+`dropzeros!`
# (this would not be consistent with existing binary sparse broadcast behavior).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO concerns the primary decision described in the original post.

# behavior presently above (this would be consistent with existing binary sparse broadcast
# behavior); or (2) the generic unary broadcast behavior should be nz2nz_z2z as above, and the
# nz2z_z2z specializations below should ultimately go away in favor of `broadcast`+`dropzeros!`
# (this would not be consistent with existing binary sparse broadcast behavior).

"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the nz2z/nz2nz decision is made, most of the code below this line till ## Broadcasting kernels specialized for returning a SparseMatrixCSC should go away.

@stevengj stevengj added the broadcast Applying a function over a collection label Nov 7, 2016
@stevengj
Copy link
Member

stevengj commented Nov 7, 2016

My initial preference would be nz2z (don't store zeros) for broadcast, but nz2nz (store zeros) for broadcast!, if that is feasible.

@stevengj
Copy link
Member

stevengj commented Nov 7, 2016

On the other hand, implementation would be simpler and faster if we used nz2nz everywhere for unary operations, and in practice I'm very skeptical that any real user will have a problem with this.

@Sacha0 Sacha0 added the sparse Sparse arrays label Nov 7, 2016
@Sacha0
Copy link
Member Author

Sacha0 commented Nov 7, 2016

My initial preference would be nz2z (don't store zeros) for broadcast, but nz2nz (store zeros) for broadcast!, if that is feasible.

Cheers, my thinking as well. I appreciate your devil's-advocate arguments though; thoughts below.

On the other hand, implementation would be simpler and faster if we used nz2nz everywhere for unary operations

nz2nz's implementation simplicity is appealing. On the other hand, the code complexity / length differences between nz2nz's implementation and a simplified nz2z implementation seem negligible relative to consistency and ideal-behavior concerns.

The benchmarks in #18309 indicate that nz2nz is somewhat faster in some (common?) cases (typically by a few percent, maximally by about thirty percent). On the other hand, nz2z is comparably faster in other (common?) cases, and much faster in still other (uncommon?) cases. In other words, nz2z's performance is mostly comparable to that of nz2nz, but nz2z's worst-case performance is meaningfully better than nz2nz's, perhaps making it a good choice for a generic method.

Another small argument for nz2z: If a developer needs nz2nz behavior, she should be able to safely achieve it in less than five lines of code, whereas nz2nz is more subtle.

Best!

@Sacha0
Copy link
Member Author

Sacha0 commented Nov 7, 2016

(Absent additional input in the next couple of days I will revise this PR with nz2z behavior.)

function broadcast{Tf}(f::Tf, A::SparseMatrixCSC)
fofzero = f(zero(eltype(A)))
fpreszero = fofzero == zero(fofzero)
return fpreszero ? _broadcast_zeropres(f, A) : _broadcast_notzeropres(f, A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you pass fofzero to _broadcast_notzeropres so that it doesn't need to be recomputed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I passed fofzero to _broadcast_notzeropres for that reason, but later nixed the pass for simplicity. Happy to restore the pass if that strikes you as better though? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't make much difference, I guess, so whatever you prefer.

@Sacha0
Copy link
Member Author

Sacha0 commented Nov 12, 2016

Switched to nz2z behavior and nixed associated specialized methods. Best!

@Sacha0 Sacha0 force-pushed the genspbc branch 2 times, most recently from 1bff9ff to f5bc714 Compare November 12, 2016 00:31
@stevengj
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@stevengj
Copy link
Member

stevengj commented Nov 12, 2016

Big improvement in the sparse-broadcast benchmark, unsurprisingly. Odd regression in the array-comprehension collect benchmark, probably unrelated noise since that doesn't involve sparse matrices at all.

@stevengj
Copy link
Member

Seeing the same testset spawn error on Linux as in #19305, so seems to be unrelated.

@Sacha0
Copy link
Member Author

Sacha0 commented Nov 12, 2016

Pushed two small modifications: (1) broadcast now passes f(zero(eltype(A))) to _broadcast_notzeropres via a fillvalue argument. (2) removed a question that should not persist in the comments. Best!

@stevengj
Copy link
Member

The Travis problem should now be fixed, if you want to rebase.

@Sacha0
Copy link
Member Author

Sacha0 commented Nov 16, 2016

Rebased. I've made some headway re. #19239 (review), but improvements on that front could be an independent PR (however you prefer). Thanks and best!

@Sacha0 Sacha0 changed the title WIP: more generic broadcast over a single sparse matrix more generic broadcast over a single sparse matrix Nov 17, 2016
@Sacha0
Copy link
Member Author

Sacha0 commented Nov 21, 2016

@stevengj, additional thoughts, or is this in shape to merge? (Separating further work along #19239 (review) line seems the most effective path forward?) Thanks and best!

@stevengj stevengj merged commit 0837ba4 into JuliaLang:master Nov 21, 2016
@Sacha0 Sacha0 deleted the genspbc branch November 21, 2016 22:03
@Sacha0
Copy link
Member Author

Sacha0 commented Nov 21, 2016

Thanks for reviewing / merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants