Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHermitianCompact #358

Merged
merged 10 commits into from
May 10, 2019
Merged

Add SHermitianCompact #358

merged 10 commits into from
May 10, 2019

Conversation

tkoolen
Copy link
Contributor

@tkoolen tkoolen commented Jan 13, 2018

I was just playing around yesterday with a statically-sized symmetric matrix type that (as opposed to Base.Symmetric) only stores the lower triangle.

This is currently lacking all tests and some things still need to be implemented, but what's there now seem to be working in my scrap notebook. Some operations are slightly faster, some slightly slower than plain SMatrix depending on the operation, matrix size, and processor architecture. It's exciting to see that multiplying two SSymmetricCompact{3, 3}s (using the generic code in matrix_multiply.jl!) is slightly faster than multiplying regular SMatrix{3, 3}s. Overall I was slightly underwhelmed by the performance, but I still think this could be useful.

I was just wondering:

  1. Do you think this belongs in StaticArrays itself? It's kind of like SDiagonal (which I used for inspiration), so maybe?
  2. What do you think SSymmetricCompact(::Tuple) should do? I saw that for SDiagonal, it assumes that the Tuple is just the diagonal, which I thought was kind of strange given the use of the Tuple methods in convert.jl. It's also not how the Rotations.jl types do it, and I ended up following those examples.

@codecov-io
Copy link

Codecov Report

Merging #358 into master will decrease coverage by 1.95%.
The diff coverage is 1.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #358      +/-   ##
==========================================
- Coverage   93.36%   91.41%   -1.96%     
==========================================
  Files          37       38       +1     
  Lines        2713     2771      +58     
==========================================
  Hits         2533     2533              
- Misses        180      238      +58
Impacted Files Coverage Δ
src/StaticArrays.jl 100% <ø> (ø) ⬆️
src/SSymmetricCompact.jl 0% <0%> (ø)
src/SMatrix.jl 94.78% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9044eb2...69334a4. Read the comment docs.

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.0%) to 91.411% when pulling 69334a4 on tkoolen:tk/symmetric into 9044eb2 on JuliaArrays:master.

@andyferris
Copy link
Member

Cool - I've always wanted to try this and see how it panned out!

There's no reason this doesn't belong here, but I guess there is a question as to what the killer application of this beastie is? I guess it could be useful for storing lots of little, Hermitian matrices in memory?

Also, we need to keep in mind that transpose is recursive.

for SDiagonal, it assumes that the Tuple is just the diagonal, which I thought was kind of strange given the use of the Tuple methods in convert.jl

Oh dear, we should probably change that, the StaticArray interface should take precedence over a convenience method.

@KristofferC
Copy link
Contributor

In continuum mechanics, symmetric second order tensors is very common (we have something similar to this in Tensors.jl).

@andyferris
Copy link
Member

In continuum mechanics, symmetric second order tensors is very common

So that would be a SSymmetricCompact{3,3} representing stress or strain or something at every grid point / finite element? Thus the space savings relative to many Symmetric(::SMatrix{3,3})s are substantial?

@KristofferC
Copy link
Contributor

Yes. In addition, the derivative of a symmetric second order tensors with another gives symmetric fourth order tensors where the storage saving is more substantial (36 vs 81 elements for symmetric fourth order vs unsymmetric).

@tkoolen
Copy link
Contributor Author

tkoolen commented Jan 15, 2018

I was personally hoping to use this in RigidBodyDynamics to speed up operations involving moments of inertia.

Made a bit more progress. Next up is eye/one and tests.

Also, we need to keep in mind that transpose is recursive.

Thanks for the heads up; fixed this. I did notice that this is actually currently not the case for StaticMatrix:

@inline transpose(m::StaticMatrix) = _transpose(Size(m), m)
# note: transpose of StaticVector is a RowVector, handled by Base
@generated function _transpose(::Size{S}, m::StaticMatrix) where {S}
Snew = (S[2], S[1])
exprs = [:(m[$(sub2ind(S, j1, j2))]) for j2 = 1:S[2], j1 = 1:S[1]]
return quote
$(Expr(:meta, :inline))
@inbounds return similar_type($m, Size($Snew))(tuple($(exprs...)))
end
end

@andyferris
Copy link
Member

did notice that this is actually currently not the case for StaticMatrix

Yeah I think this was a design decision I was trying to ignore when we first wrote static arrays. :) We should fix that up.

I'm slightly worried that result here might be that the eltype is a Union because some elements are transposed and others not. (Same for Symmetric in Base).

@tkoolen
Copy link
Contributor Author

tkoolen commented Jan 16, 2018

I'm slightly worried that result here might be that the eltype is a Union because some elements are transposed and others not. (Same for Symmetric in Base).

Maybe I don't follow, but isn't that problem avoided here, since SSymmetricCompact only stores the lower triangle?

@inline transpose(a::SSymmetricCompact) = SSymmetricCompact(transpose.(a.lowertriangle))

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.8%) to 90.594% when pulling 7dfe1cc on tkoolen:tk/symmetric into 9044eb2 on JuliaArrays:master.

@tkoolen tkoolen changed the title RFC: Add SSymmetricCompact Add SSymmetricCompact Jan 17, 2018
@tkoolen
Copy link
Contributor Author

tkoolen commented Jan 17, 2018

Alright, this is ready for review now.

Regarding the recursiveness of transpose: it seems to me that Base also doesn't do transpose recursively, since this test:

a = Symmetric([[rand(Complex{Int}) for i = 1 : 2] for row = 1 : 3, col = 1 : 3])
@test transpose(SSymmetricCompact{3}(a)) == transpose(a)

passes with the current code, but not with

transpose(a::SSymmetricCompact) = SSymmetricCompact((map(transpose, a.lowertriangle)))

@andyferris
Copy link
Member

it seems to me that Base also doesn't do transpose recursively

Is that v0.6 or v0.7?

@tkoolen
Copy link
Contributor Author

tkoolen commented Jan 18, 2018

0.6 and 0.7 both have (different) issues.

On 0.6.1:

julia> a = Symmetric([[Complex(row * i, col * i + 1) for i = 1 : 2] for row = 1 : 3, col = 1 : 3])
3×3 Symmetric{Array{Complex{Int64},1},Array{Array{Complex{Int64},1},2}}:
 Complex{Int64}[1+2im, 2+3im]  Complex{Int64}[1+3im, 2+5im]  Complex{Int64}[1+4im, 2+7im]
 Complex{Int64}[1+3im, 2+5im]  Complex{Int64}[2+3im, 4+5im]  Complex{Int64}[2+4im, 4+7im]
 Complex{Int64}[1+4im, 2+7im]  Complex{Int64}[2+4im, 4+7im]  Complex{Int64}[3+4im, 6+7im]

julia> transpose(a)[1]
2-element Array{Complex{Int64},1}:
 1+2im
 2+3im

julia> transpose(Array(a))[1]
ERROR: MethodError: Cannot `convert` an object of type RowVector{Complex{Int64},Array{Complex{Int64},1}} to an object of type Array{Complex{Int64},1}
This may have arisen from a call to the constructor Array{Complex{Int64},1}(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] transpose_f!(::Base.#transpose, ::Array{Array{Complex{Int64},1},2}, ::Array{Array{Complex{Int64},1},2}) at ./linalg/transpose.jl:54
 [2] transpose(::Array{Array{Complex{Int64},1},2}) at ./linalg/transpose.jl:121

on 5-day old master:

julia> a = Symmetric([[Complex(row * i, col * i + 1) for i = 1 : 2] for row = 1 : 3, col = 1 : 3])
3×3 Symmetric{Array{Complex{Int64},1},Array{Array{Complex{Int64},1},2}}:
 [1+2im, 2+3im]  [1+3im, 2+5im]  [1+4im, 2+7im]
 [1+3im, 2+5im]  [2+3im, 4+5im]  [2+4im, 4+7im]
 [1+4im, 2+7im]  [2+4im, 4+7im]  [3+4im, 6+7im]

julia> transpose(a)[1]
2-element Array{Complex{Int64},1}:
 1 + 2im
 2 + 3im

julia> transpose(Array(a))[1]
1×2 Transpose{Complex{Int64},Array{Complex{Int64},1}}:
 1+2im  2+3im

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 93.535% when pulling 43d5f82 on tkoolen:tk/symmetric into 9044eb2 on JuliaArrays:master.

@coveralls
Copy link

coveralls commented Jan 18, 2018

Coverage Status

Coverage increased (+0.6%) to 81.697% when pulling 029e691 on tkoolen:tk/symmetric into e0d5f09 on JuliaArrays:master.

@andyferris
Copy link
Member

@tkoolen thought I should let you know I'm trying to ignore this until all the v0.7 carnage is dealt with, but I am generally supportive of this addition.

@timholy
Copy link
Member

timholy commented May 1, 2019

Also relevant for JuliaIO/NRRD.jl#30

@mschauer
Copy link
Collaborator

mschauer commented May 1, 2019

I think SSymmetric might be on the edge of still belonging to StaticArrays.jl. For more complicated objects matrices one can think of going down the route of https://github.com/mschauer/StaticLinearMaps.jl/blob/master/src/StaticLinearMaps.jl

@tkoolen
Copy link
Contributor Author

tkoolen commented May 1, 2019

For more complicated objects matrices one can think of going down the route of https://github.com/mschauer/StaticLinearMaps.jl/blob/master/src/StaticLinearMaps.jl

As in, in a separate package, perhaps. The SDiagonal precedent has lost some of its power since I opened this PR. But I don't think StaticLinearMaps itself is the right place, since the SLinearMap abstract type is not a StaticMatrix subtype. And in the end I do still think this belongs in StaticArrays, because it means not having to pay the overhead of maintaining another package and because of improved discoverability.

@mschauer
Copy link
Collaborator

mschauer commented May 1, 2019

? I said (or meant to say) that it belongs here.

tkoolen added 5 commits May 1, 2019 12:32
* Extract out @pure triangularnumber and triangularroot functions, use to simplify code
* Add similar_type overload
* Speed up == overload
* Generalize +, - overloads with two SSymetricCompact matrices by overloading _fill
* More complete list of scalar-array ops
* Make transpose recursive
* Overloads for rand, randn, randexp
@tkoolen
Copy link
Contributor Author

tkoolen commented May 1, 2019

Ah, I misunderstood then.

@tkoolen
Copy link
Contributor Author

tkoolen commented May 1, 2019

I've brought this up to date and fixed some issues with recursive transposes and adjoints. @timholy, could I ask you to review this?

@tkoolen
Copy link
Contributor Author

tkoolen commented May 1, 2019

By the way, if you're wondering about the efficiency of getindex, the compiler is really doing some magic here, especially for types where transpose(x) = x:

julia> sa = SSymmetricCompact(rand(SMatrix{5, 5}));

julia> f(sa, i) = @inbounds getindex(sa, i)
f (generic function with 1 method)

julia> @code_native f(sa, 1)
        .text
; ┌ @ REPL[23]:1 within `f'
; │┌ @ SSymmetricCompact.jl:82 within `getindex' @ REPL[23]:1
        shlq    $4, %rsi
; │└
; │┌ @ pair.jl:50 within `getindex'
        movabsq $139903100985504, %rax  # imm = 0x7F3DBAA320A0
; │└
; │┌ @ SSymmetricCompact.jl:83 within `getindex' @ SVector.jl:37 @ tuple.jl:24
        movq    -16(%rsi,%rax), %rax
; ││ @ SSymmetricCompact.jl:84 within `getindex'
        vmovsd  -8(%rdi,%rax,8), %xmm0  # xmm0 = mem[0],zero
; │└
        retq
        nopw    (%rax,%rax)
; └

I don't understand the trick the compiler is using yet, but since the native code for various matrix sizes only seems to differ in one constant (e.g. 139903100985504 above), I wonder if the underlying trick could be used to implement a fast getindex(::Symmetric, i) and change the IndexStyle for Symmetric to IndexLinear.

Edit: that one constant is just a memory address, probably of the precomputed tuple of indices that it's indexing into; it's not using some magical formula.

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
end
end

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int) where {N}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why implement this rather than getindex(a, i::Int, j::Int) and make it IndexCartesian?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reasons are that getindex(a, i::Int, j::Int) would also not be trivial, requiring a solution similar to _symmetric_compact_indices to be fast, and a lot of functionality inside of StaticArrays implicitly assumes that all StaticArrays are IndexLinear, e.g.:

tmp = [:(a[$j][$i]) for j 1:length(a)]
exprs[i] = :(f($(tmp...)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the nontrivial, doesn't

i, j = i >= j ? (i, j) : (j, i)
idx = i + (j-1)*N

do the job?

If you've not benchmarked it or checked generated code, this seems to be worth doing. But most likely you have and you know that it's all fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not quite right.

julia> N = 3; L = StaticArrays.triangularnumber(N)
6

julia> m = SSymmetricCompact(SVector{L}(1 : L))
3×3 SSymmetricCompact{3,Int64,6}:
 1  2  3
 2  4  5
 3  5  6

julia> i, j = 2, 3;

julia> m[i, j]
5

julia> i, j = i >= j ? (i, j) : (j, i)
(3, 2)

julia> idx = i + (j-1)*N
6

julia> m.lowertriangle[idx]
6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! Try

idx = i - j + 1 + ((j-1)*(2N+2-j)÷2)

The point being: your implementation looks slow, though perhaps it's not.

Copy link
Contributor Author

@tkoolen tkoolen May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite right either I'm afraid. After removing the current getindex and adding

Base.IndexStyle(::Type{<:SSymmetricCompact}) = Base.IndexCartesian()

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int, j::Int) where N
    idx = i - j + 1 + ((j-1)*(2N+2-j)÷2)
    x = a.lowertriangle[idx]
end

we get

julia> N = 3; L = StaticArrays.triangularnumber(N)
6

julia> m = SSymmetricCompact(SVector{L}(1 : L))
3×3 SSymmetricCompact{3,Int64,6}:
 1  3  4
 2  4  5
 3  5  6

but it's supposed to be

3×3 SSymmetricCompact{3,Int64,6}:
 1  2  3
 2  4  5
 3  5  6

As I showed in #358 (comment), the compiler does a very good job on the whole _symmetric_compact_indices thing. Basically, it computes that tuple at compile time for a given N, and then the getindex method simply indexes into that constant tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I should still add the i, j = i >= j ? (i, j) : (j, i); it works then. I'll benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, with

Base.IndexStyle(::Type{<:SSymmetricCompact}) = Base.IndexCartesian()

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int) where N
    # override StaticArray overload, replace with AbstractArray version.
    # Base._getindex(IndexStyle(a), a, Base.to_indices(a, (i,))...)
    c, r = divrem(i - 1, N)
    c, r = c + 1, r + 1
    a[r, c]
end

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int, j::Int) where N
    i, j = i >= j ? (i, j) : (j, i)
    idx = i - j + 1 + ((j-1)*(2N+2-j)÷2)
    x = a.lowertriangle[idx]
end

we get

julia> @btime a[i] setup = begin
           a = SSymmetricCompact(rand(SMatrix{6, 6}))
           i = rand(1 : length(a))
       end
  3.272 ns (0 allocations: 0 bytes)

whereas before,

julia> @btime a[i] setup = begin
           a = SSymmetricCompact(rand(SMatrix{6, 6}))
           i = rand(1 : length(a))
       end
  1.783 ns (0 allocations: 0 bytes)

Do note that multiplying two SSymmetricCompacts results in the same runtime before and after this change, as the indices are known at compile time and so it looks like the compiler is able to constant-propagate everything in either case. But just randomly linearly indexing into an SSymmetricCompact is faster with the _symmetric_compact_indices approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my proposal is really for IndexCartesian, where you never have to call divrem (which is insanely slow).

But I just benchmarked with

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int, j::Int) where {N}
    i, j, lower = j > i ? (j, i, false) : (i, j, true)
    idx = i - j + 1 + ((j-1)*(2N+2-j))>>UInt(1)
    @inbounds value = a.lowertriangle[idx]
    return lower ? value : transpose(value)
end

and to my surprise it's no better. So I'm happy going with yours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For constant y the compiler does a very good job on div(x, y) and divrem(x, y), almost to the point where it produces the same code as the manual >> UInt(1) optimization:

julia> f(x) = div(x, 2);

julia> g(x) = x >> UInt(1);

julia> @code_native f(3)
        .text
; ┌ @ REPL[15]:1 within `f'
; │┌ @ REPL[15]:1 within `div'
        movq    %rdi, %rax
        shrq    $63, %rax
        leaq    (%rax,%rdi), %rax
        sarq    %rax
; │└
        retq
        nop
; └

julia> @code_native g(3)
        .text
; ┌ @ REPL[16]:1 within `g'
; │┌ @ REPL[16]:1 within `>>'
        sarq    %rdi
; │└
        movq    %rdi, %rax
        retq
        nopw    (%rax,%rax)
; └

Similarly for divrem(i, N) for fixed N.

Copy link
Contributor Author

@tkoolen tkoolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review!

end
end

Base.@propagate_inbounds function Base.getindex(a::SSymmetricCompact{N}, i::Int) where {N}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reasons are that getindex(a, i::Int, j::Int) would also not be trivial, requiring a solution similar to _symmetric_compact_indices to be fast, and a lot of functionality inside of StaticArrays implicitly assumes that all StaticArrays are IndexLinear, e.g.:

tmp = [:(a[$j][$i]) for j 1:length(a)]
exprs[i] = :(f($(tmp...)))

src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty neat. I'm worried that the Tuple constructor will interact quite badly with the similar_type machinery? Might need to have SMatrix as the result of similar_type.

Some tests of functionality for non symmetry-preserving operations should be added I think.

src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
An `SSymmetricCompact` may be constructed either:

* from an `AbstractVector` containing the lower triangular elements; or
* from a `Tuple` containing both upper and lower triangular elements (in column major order); or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess this inconsistency between Tuple and AbstractVector is due to the rest of StaticArrays assuming dense storage? That's unfortunate.

Does this Tuple constructor ignoring the top half, combined with the similar_type machinery mean that a lot of StaticArrays functionality is actually broken with SSymmetricCompact? For example, matrix multiply of SSymmetricCompact with SMatrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a pickle (see #358 (comment), item 2), but I think the current implementation still makes sense.

Any StaticArray definitely needs to be constructable from a Tuple, which needs to have the interpretation of the full list of column-major-ordered elements.; that's the basic assumption in convert.jl. That also means we can't use Tuple as the type of the lowertriangle field without causing all kinds of confusion with the constructors. So I chose to use an SVector as the storage type for the lower triangular elements and designed the constructors in accordance. The precedent for this was SDiagonal (which has since been replaced with Symmetric{T, SVector{<:Any, T}; maybe the same will happen for this case in time).

Note that the StaticMatrix subtype constructors / convert methods currently also have the job of implementing reshape,

# this covers most conversions and "statically-sized reshapes"
@inline convert(::Type{SA}, sa::StaticArray) where {SA<:StaticArray} = SA(Tuple(sa))

which I'm not 100% on board with anyway. SSymmetricCompact hijacks this particular use case, but I'd say it's niche enough that it's OK. Alternatively, a new type could be added precisely for this purpose, but I don't think there's much of an advantage in doing so.

Does this Tuple constructor ignoring the top half, combined with the similar_type machinery mean that a lot of StaticArrays functionality is actually broken with SSymmetricCompact? For example, matrix multiply of SSymmetricCompact with SMatrix?

No. similar_type is not overloaded for SSymmetricCompact, so the default SMatrix is used. So in that sense, SSymetricCompact is very similar to e.g. Rotations.Quat, and so the only thing that really matters is that getindex(::StaticArray, i::Int) works.

src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
src/SSymmetricCompact.jl Outdated Show resolved Hide resolved
@chriscoey
Copy link

possibly relevant comment: JuliaLang/julia#31836 (comment)

@tkoolen
Copy link
Contributor Author

tkoolen commented May 3, 2019

I'd be fine with turning this into SHermitianCompact.

@tkoolen tkoolen changed the title Add SSymmetricCompact Add SHermitianCompact May 8, 2019
@tkoolen
Copy link
Contributor Author

tkoolen commented May 8, 2019

OK, changed to SHermitianCompact. I think this is good to go now. I'll leave this open for another day or so and then I'll merge absent additional comments.

@tkoolen tkoolen merged commit ee01ea1 into JuliaArrays:master May 10, 2019
@tkoolen
Copy link
Contributor Author

tkoolen commented May 10, 2019

Thanks, all!

@tkoolen tkoolen deleted the tk/symmetric branch May 11, 2019 16:08
@c42f c42f mentioned this pull request Jun 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants