Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map on sparse arrays does not work with non-numeric data #19561

Closed
amitmurthy opened this issue Dec 12, 2016 · 30 comments
Closed

map on sparse arrays does not work with non-numeric data #19561

amitmurthy opened this issue Dec 12, 2016 · 30 comments
Labels
regression Regression in behavior compared to a previous version sparse Sparse arrays

Comments

@amitmurthy
Copy link
Contributor

julia> map(x-> x != 0.0 ? "Hello" : x, sparse(eye(4)))
ERROR: MethodError: no method matching zero(::Type{Any})

Also a problem when mixing ints and floats.

julia> map(x-> x != 0.0 ? 1 : x, sparse(eye(4)))
ERROR: MethodError: no method matching zero(::Type{Any})

If sparse arrays are not meant to be used with non-numeric data, we should at least throw a better error message.

@oscardssmith
Copy link
Member

What version of Julia are you using? on 0.5.0 this works

@amitmurthy
Copy link
Contributor Author

master. map was recently implemented for sparse arrays.

@tkelman tkelman added sparse Sparse arrays regression Regression in behavior compared to a previous version labels Dec 12, 2016
@Sacha0
Copy link
Member

Sacha0 commented Dec 12, 2016

The new map[!]/broadcast[!] methods for SparseMatrixCSCs check whether the map/broadcast operation yields zero when all arguments are zero. Specifically, for e.g. map(f, A, B), the check is

fofzeros = f(zero(eltype(A)), zero(eltype(B)))
fpreszeros = fofzeros == zero(fofzeros)

This check requires that (1) the input array eltypes provide zero and (2) the return type of f (evaluated for arguments of the input array eltypes) provides zero.

The first requirement is fundamental: A sparse matrix C is only well defined if zero(eltype(C)) is well defined, as otherwise C's unstored entries are not well defined (in which case maping over C makes little sense).

The second requirement is relaxable: We need only the result of the hypothetical iszero(fofzeros). Not having iszero, we must use either fofzeros == zero(fofzeros) or fofzeros == 0 (alternatives?). The former handles units gracefully, but fails where zero(fofzeros) is not defined. The latter does not handle units gracefully, but works in some other cases where zero(fofzeros) is not defined.

Both examples above violate the second requirement and would work given iszero (without tradeoff) or with fofzeros == 0 in place of fofzeros == zero(fofzeros) (trading off handling units).

Thoughts? Best!

@oscardssmith
Copy link
Member

oscardssmith commented Dec 12, 2016

alternatively wouldn't defining zero for type any work? I'm not sure what that would be though.

@tkelman
Copy link
Contributor

tkelman commented Dec 12, 2016

If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

We should just add iszero.

@Sacha0
Copy link
Member

Sacha0 commented Dec 12, 2016

If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

Absolutely, SparseMatrixCSCs with only stored entries well defined have utility. Applying whole-array operations (e.g. map) to such objects makes little sense though; the hypothetical mapstored seems the appropriate operation in that case.

We should just add iszero.

+1 :). Best!

@nalimilan
Copy link
Member

But how would iszero help here? We wouldn't define it on Symbol, String nor Function, right?

@Sacha0
Copy link
Member

Sacha0 commented Dec 12, 2016

But how would iszero help here? We wouldn't define it on Symbol, String nor Function, right?

In both cases above, iszero(fofzeros) could check fofzeros == 0 rather than fofzeros == zero(fofzeros), skirting lack of definition of zero for Any?

@Sacha0
Copy link
Member

Sacha0 commented Dec 12, 2016

Looked closer and I'm mistaken. The zero preservation check discussed above works as written in these cases. Rather, checking whether output entry Cx = f(xs...) is zero via Cx != zero(eltype(C)) causes the reported failures. Replacing that check with iszero(Cx), or shy of that Cx != zero(Cx), would fix the second reported failure (the output sparse matrix having mixed numeric type). But with a bit more thought, the present semantics of map[!]/broadcast[!] over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of defining zero for Strings (or at least comparison with 0). Best!

@Sacha0
Copy link
Member

Sacha0 commented Dec 12, 2016

But with a bit more thought, the present semantics of map[!]/broadcast[!] over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of defining zero for Strings (or at least comparison with 0).

That thought was silly. iszero would save us in that case as well, there being no problem with comparison against 0: Replacing Cx != zero(eltype(C)) with iszero(Cx) and having the fallback iszero(x) = x == 0 would fix the first reported failure as well. Thoughts? Best!

@stevengj
Copy link
Member

Using sparse arrays/vectors for non-algebraic data (types that do not define zero) seems like a bad pun to me. These are linear-algebra objects.

@stevengj
Copy link
Member

For example, what does getindex return for the non-stored entries if zero is not defined?

@stevengj
Copy link
Member

stevengj commented Dec 16, 2016

If you just want a general "sparse" associative "array" from Int to a random type T, you should use Dict or similar.

@stevengj
Copy link
Member

And if you want to map the nonzero elements of a sparse array to non-numeric values, you should use map(f, nonzeros(A)).

@tkelman
Copy link
Contributor

tkelman commented Dec 16, 2016

Dict isn't CSC. It's useful on occasion to have a sparse array that has the same index iteration structure as a numeric SparseMatrixCSC, with effectively undefined non-stored entries. Linear algebra operations won't work on them, but many array manipulation operations will.

@stevengj
Copy link
Member

Lots of things are occasionally useful but lead to poor library design. We normally try to avoid bad type puns in Base.

@stevengj
Copy link
Member

Could you give a concrete example where nonzeros(A) would not suffice?

@tkelman
Copy link
Contributor

tkelman commented Dec 16, 2016

We allow non numeric or heterogeneously typed dense arrays. Allowing for the same with sparse matrices is a feature that it would be a regression to lose. Not being able to do linear algebra on some array types doesn't always make it a pun or bad library design to support.

For this particular example of map I agree nonzeros is mostly good enough. Will have to check whether the assumption that zero(eltype(A)) exists impacts other use cases.

@stevengj
Copy link
Member

The definition of sparsity here is "mostly zero", which is what makes it seem like a pun, and rather different from an Associative type (a partial function), which is what you seem to want to use it for.

@stevengj
Copy link
Member

The generalization would make more sense if we allowed an arbitrary default value, but that could cause trouble with code expecting "sparse" to mean "default zero"

Maybe we should have a DefaultArray type with an arbitrary default value, and have SparseMatrixCSC be a subtype?

@Sacha0
Copy link
Member

Sacha0 commented Dec 16, 2016

@oscardssmith
Copy link
Member

@stevenj this is a really good idea. It will also provide a path where we don't return dense arrays when we broadcast nonzero preserving functions, which could be a huge win for memory usage.

@stevengj
Copy link
Member

If we want to use it for avoiding dense broadcast output, then the sparse case can't be a subtype (wouldn't be type stable). I dunno, maybe all sparse arrays should be DefaultArrays after all. @StefanKarpinski was arguing for this in another issue, and I was skeptical, but maybe it's worth the trouble.

@oscardssmith
Copy link
Member

The really big advantage would be that all broadcasts would be efficient and produce the same type no matter what. Also, if I understood his proposal correctly, we could do even better, as his was a way to cheat a default from a sparse, but if it were going on base, we could further simplify the code.

@Sacha0
Copy link
Member

Sacha0 commented Dec 18, 2016

For a potential stopgap solution, please see #17623 (comment). Best!

If some feel strongly about the stricter iszero definition while others feel strongly about {map/broadcast over sparse matrices where the output element types don't provide zero}, we can accommodate both desires with a strict iszero definition and a _sparsebc_iszero which wraps iszero but provides the permissive fallback.

Sacha0 added a commit that referenced this issue Jan 1, 2017
Fix #19561 (sparse map/broadcast where the output eltype is not a concrete subtype of Number)
@Sacha0 Sacha0 reopened this Mar 2, 2017
@Sacha0
Copy link
Member

Sacha0 commented Mar 2, 2017

Type inference seems to have improved (:tada:) to the point that half of the tests for this issue are ineffective:

julia> intoneorfloatzero(x) = x != 0.0 ? Int(1) : Float64(x)
intoneorfloatzero (generic function with 1 method)

julia> foo = map(intoneorfloatzero, speye(4))
4×4 SparseMatrixCSC{Union{Float64, Int64},Int64} with 4 stored entries:
  [1, 1]  =  1
  [2, 2]  =  1
  [3, 3]  =  1
  [4, 4]  =  1

julia> eltype(foo)
Union{Float64, Int64}

julia> zero(eltype(foo))
0

(Previously zero(eltype(foo)) would fail.) Additionally, I found four code paths not fixed by #19589 and missed by the existing tests. Stronger tests and fix inbound. Best!

Sacha0 added a commit to Sacha0/julia that referenced this issue Mar 2, 2017
tkelman pushed a commit that referenced this issue Mar 4, 2017
where the output eltype is not a concrete subtype of Number (#19561, later part).
@amitmurthy
Copy link
Contributor Author

Closed by #20862 I believe.

@andreasnoack
Copy link
Member

As mentioned in #22945 (comment), the consequence of the fix here might cause some trouble elsewhere. @amitmurthy would you be able to share some real world examples where map on a SparseMatrix{String,Int} would be useful?

@amitmurthy
Copy link
Contributor Author

The specific requirement was for a SparseMatrix{Ref,Int} in light of how asyncmap (and hence pmap) is implemented. While still a TODO -

julia/base/asyncmap.jl

Lines 263 to 268 in 0a37b3d

# TODO: Optimize for sparse arrays
# For now process as regular arrays and convert back
function asyncmap(f, s::AbstractSparseArray...; kwargs...)
sa = map(Array, s)
return sparse(asyncmap(f, sa...; kwargs...))
end
- it can be optimized if SparseMatrix{Ref,Int} continues to be supported.

Other than the above I don't have any other real-world examples for non-numeric sparse arrays.

@Sacha0
Copy link
Member

Sacha0 commented Jul 26, 2017

Above @tkelman mentions use cases for SparseMatrixCSCs of symbols and/or functions:

I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

7 participants