circshift(SArray) returns MArray #745

wsshin · 2020-02-22T22:58:03Z

Example:

julia> sv = SVector(1,2,3);

julia> circshift(sv, 1)
3-element MArray{Tuple{3},Int64,1,3} with indices SOneTo(3):
 3
 1
 2

I think the result should be SArray.

The text was updated successfully, but these errors were encountered:

c42f · 2020-02-24T10:38:17Z

Agreed. The reason for this is that it's dispatching to Base and similar(::SVector) returns an MVector.

Micket · 2020-04-13T00:06:18Z

I looked into this as I wanted to learn a bit more about Julia and this library, but, oh boy, this was (to a novice like me) a surprisingly difficult problem.
The immutability basically forces a much more inconvenient implementation (with significant overhead), just like setindex.

So, unless one goes for an approach with a mutable temporary, you'd have to end up with something along the lines of;

circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
    exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
    return quote
        @inbounds return typeof(a)(tuple($(exprs...)))
    end
end

So, given the forced marriage of stack allocation and immutability in Julia, I have started to think of this more philosophically. Basically, the need for this a circshift (or a setindex) with a dynamic shift variable indicates an underlying problem.
A "compile time" length for the shift is what we actually should want most of the times with static variables (I apologize for the messy code here; i'm new to generated functions and barely know what I'm doing)

function circshift(a::StaticArray{Tuple{L}}, ::Length{N}) where {L, N}
    vcat(_slice(Length(L-N+1), Length(L), a), _slice(Length(1), Length(L-N), a))
end
@generated function _slice(::Length{L}, ::Length{U}, a::StaticArray{<:Tuple,T}) where {L, U, T}
    exprs = [:(a[$i]) for i = L:U]
    return quote
        @inbounds return SArray{Tuple{1+U-L},T,1,1+U-L}(tuple($(exprs...)))
    end
end

which, of course, is super-fast if one specifies a static shift circshift(a, Length(3))

c42f · 2020-04-13T01:55:14Z

I looked into this as I wanted to learn a bit more about Julia and this library, but, oh boy, this was (to a novice like me) a surprisingly difficult problem.

Haha! Well thanks for jumping right in I think you're on the right track here :-)

So, unless one goes for an approach with a mutable temporary, you'd have to end up with something along the lines of [...]

Yes I think that's basically the right way to implement it. However it might not be so bad as you think: the compiler is quite good at constant propagation, so in the cases where shift is constant in the caller, the code for all the occurrences of mod in _circshift will be optimized away. However for this to work you may need to encourage some extra inlining:

@inline Base.circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
    exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
    return quote
        $(Expr(:meta, :inline))
        @inbounds return typeof(a)(tuple($(exprs...)))
    end
end

Note the appearance of Expr(:meta, :inline) in the generated code, which is the confusing way you need to write @inline for @generated functions. It seemed I also needed to add @inline to circshift.

Thus we get extremely simple native code when the shift amount can constant propagated all the way inward. Eg, for a function foo:

julia> foo(a) = circshift(a, length(a)÷2)
foo (generic function with 1 method)

julia> foo(SA[1,2,3,4,5])
5-element SArray{Tuple{5},Int64,1,5} with indices SOneTo(5):
 4
 5
 1
 2
 3

julia> @code_native foo(SA[1,2,3,4,5])
	.text
; ┌ @ REPL[11]:1 within `foo'
	movq	%rdi, %rax
; │┌ @ REPL[9]:1 within `circshift'
; ││┌ @ REPL[3]:2 within `_circshift'
; │││┌ @ REPL[3]:5 within `macro expansion'
	vmovups	24(%rsi), %xmm0
	vmovups	(%rsi), %xmm1
	movq	16(%rsi), %rcx
; │└└└
	vmovups	%xmm0, (%rdi)
	vmovups	%xmm1, 16(%rdi)
	movq	%rcx, 32(%rdi)
	retq
	nop
; └

So, given the forced marriage of stack allocation and immutability in Julia

Luckily this isn't quite true :-) The compiler will commonly allocateMVector on the stack as well, but only if it can prove that it's a temporary which doesn't escape the stack frame. So one viable alternative to the generated function might be to use a temporary MVector, but re-wrap in the type of the input vector before returning it. This will convert to SVector if an SVector was passed in.

Micket · 2020-04-13T15:25:31Z

Yeah I didn't manage to get it to inline properly, so i saw very poor performance.

Though, the tricky case is that performance is somewhat poor (as arrays get a bit larger) when the calls to mod can't optimized away

Experimenting a bit

@inline function circshift(v::SVector{L}, shift::Integer) where L
    out = similar(v)
    shift = mod(shift, L)
    cut = L-shift
    @inbounds for i in 1:(L-cut)
        out[i] = v[i+cut]
    end
    @inbounds for i in 1:cut
        out[i+shift] = v[i]
    end
    return SVector(out)
end

So, this version seems to be sufficiently optimizer friendly, with the added benefit of not having to mess about with generated functions (a plus in I.M.H.O.) and the fact that it's fast in cases where shift isn't known at compile time.

c42f · 2020-04-14T02:11:31Z

not having to mess about with generated functions (a plus in I.M.H.O.)

Yes it's always nice to avoid @generated!

function circshift(v::SVector{L}, shift::Integer) where L

You're not using L in dispatch here, so you can simplify the signature and just use length internally which should cleaner and can easily be optimized as it can always be known from typeof(v):

@inline function circshift(v::StaticVector, shift::Integer)
    w = similar(v)
    L = length(v)
    shift = mod(shift, L)
    cut = L-shift
    @inbounds for i in 1:(L-cut)
        w[i] = v[i+cut]
    end
    @inbounds for i in 1:cut
        w[i+shift] = v[i]
    end
    return similar_type(v)(w)
end

andyferris · 2020-04-15T00:18:41Z

Really nice - so that does the end up with good code these days? Do we end up with a bunch of extra move instructions on the last line or is it ellided entirely?

What happens for big numbers and other mutable things that might not work well inside an MArray?

@c42f seeing this I feel like we are edging closer to making this package simpler or even redundant - I mean that function is basically good for AbstractArray! I'd simply replace the last line with return freeze(w) (a la Keno's suggestion). To fix MArray we could really use a "mutable tuple" which freezes to a Tuple... and then maybe constant propagation is the only "magic" we need here.

c42f · 2020-04-15T01:04:12Z

What happens for big numbers and other mutable things that might not work well inside an MArray?

There's a WIP PR to Base (JuliaLang/julia#34126) which puts pointers inline, so that could maybe help fix MArray with BigFloat etc (though we'd still need extra care to preserve GC invariants). In fact that suggests #749 might be reasonable to merge.

seeing this I feel like we are edging closer to making this package simpler or even redundant - I mean that function is basically good for AbstractArray! I'd simply replace the last line with return freeze(w) (a la Keno's suggestion).

Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.

andyferris · 2020-04-15T02:39:20Z

There's a WIP PR to Base (JuliaLang/julia#34126) which puts pointers inline, so that could maybe help fix MArray with BigFloat etc (though we'd still need extra care to preserve GC invariants).

Unfortunately this doesn't affect all objects (structs with union types and unallocated fields may be treated differently).

Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.

Yeah! Let's do it :) I'd still be tempted to create a seperate package with freeze and thaw and overload that, so everyone can explore.

tkf · 2020-04-15T06:51:56Z

I'd still be tempted to create a seperate package with freeze and thaw and overload that, so everyone can explore.

I'm thinking to do this for a while for structs and tuples. It's super easy to implement this with Setfield.jl: jw3126/Setfield.jl#56 (comment)

andyferris · 2020-04-15T07:13:27Z

Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.

Yeah! Let's do it :) I'd still be tempted to create a seperate package with freeze and thaw and overload that, so everyone can explore.

So... I got overexcited. I think I need this for Dictionaries.jl anyways.

https://github.com/andyferris/Freeze.jl

tkf · 2020-04-15T07:55:29Z

You beat me to it! I hope I'm not slowing down your excitement but I think issettable is too coarse-grained. It might be better to be able to freeze/thaw different entities (e.g., shape-frozen vs value-frozen). For example, you'd want shape-frozen value-mutable arrays for columns of tables. Also, I've been thinking that it'd be nice to have an ownership-as-convention kind of API. This way, you can get a from thaw(move(freeze(a))) "safely."

andyferris · 2020-04-15T09:14:24Z

Haha!

Yes we definitely need “shape frozen” vs “value frozen” semantics and I meant that issettable is only for mutating values not the shape. My experience with Dictionaries.jl is containers whose keys change are significantly more challenging to design shared generic code for, and also all static arrays are shape frozen anyway.

If you are going to think about these things remember that append-only datasets are a thing and that we might want eg a vector that supports push! but not setindex!. Thinking if it this way we can keep the two concepts simpler and orthogonal rather than complecting them. (That’s also why setindex! doesn’t insert new keys in Dictionaries.jl).

For dictionaries I was thinking isinsertable, freezekeys and thawkeys could be the (orthogonal) “shape” interface. What do you think?

KristofferC · 2020-04-15T10:28:26Z

Note that it is possible to write something like

function matmul(a::SMatrix{I, J}, b::SMatrix{J, K}) where {I, J, K}
    c = zero(MMatrix{I, K})
    @inbounds for k in 1:K, j in 1:J
        b_jk = b[j, k]
        @simd for i in 1:I
            c[i, k] = muladd(a[i, j], b_jk, c[i, k])
        end
    end
    return SMatrix(c)
end

now, without causing allocations. It seems LLVM only unrolls the innermost loop now for some reason though.

tkf · 2020-04-15T17:34:49Z

(I opened andyferris/Freeze.jl#1 to avoid derailing this issue too much.)

andyferris · 2020-04-16T13:29:08Z

So I started experimenting further: https://github.com/andyferris/StaticArraysLite.jl

So far the basic functionality is quite competitive to StaticArrays for SVector{3, Float64}. Would be very interesting to see how this goes for matrices and slightly larger arrays.

wsshin · 2022-02-22T05:11:23Z

I need to use circshift() on SVector, and I discovered someone submitted this issue a while ago. It turned out that it was actually myself 😁.

Back then I just reported the lack of this functionality and didn't think about the solution. Base.circshift() supports multi-dimensional arrays, but if we focus on vectors, would it be possible to add the following simple solutions?

Base.circshift(v::SVector{N,T}, shift::Integer) where {N,T} = SVector(ntuple(k->v[mod1(k-shift,N)], Val(N)))
Base.circshift(v::SVector{N,T}, ::Val{S}) where {N,T,S} = SVector(ntuple(k->v[mod1(k-S,N)], Val(N)))

Here are the benchmark results:

julia> VERSION
v"1.7.2-pre.0"

julia> v = SVector(ntuple(identity, 10))
10-element SVector{10, Int64} with indices SOneTo(10):
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

julia> @btime circshift($v, 3);
  31.051 ns (0 allocations: 0 bytes)

julia> @btime circshift($v, Val(3));
  2.641 ns (0 allocations: 0 bytes)

c42f added the feature features and feature requests label Feb 24, 2020

Micket mentioned this issue Apr 13, 2020

Extending setindex for vector indexing #764

Open

tkf mentioned this issue Apr 15, 2020

Shape-frozen and value-frozen collections andyferris/Freeze.jl#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

circshift(SArray) returns MArray #745

circshift(SArray) returns MArray #745

wsshin commented Feb 22, 2020

c42f commented Feb 24, 2020

Micket commented Apr 13, 2020

c42f commented Apr 13, 2020 •

edited

Loading

Micket commented Apr 13, 2020

c42f commented Apr 14, 2020

andyferris commented Apr 15, 2020 •

edited

Loading

c42f commented Apr 15, 2020

andyferris commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 15, 2020

KristofferC commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 16, 2020

wsshin commented Feb 22, 2022 •

edited

Loading

circshift(SArray) returns MArray #745

circshift(SArray) returns MArray #745

Comments

wsshin commented Feb 22, 2020

c42f commented Feb 24, 2020

Micket commented Apr 13, 2020

c42f commented Apr 13, 2020 • edited Loading

Micket commented Apr 13, 2020

c42f commented Apr 14, 2020

andyferris commented Apr 15, 2020 • edited Loading

c42f commented Apr 15, 2020

andyferris commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 15, 2020

KristofferC commented Apr 15, 2020

tkf commented Apr 15, 2020

andyferris commented Apr 16, 2020

wsshin commented Feb 22, 2022 • edited Loading

c42f commented Apr 13, 2020 •

edited

Loading

andyferris commented Apr 15, 2020 •

edited

Loading

wsshin commented Feb 22, 2022 •

edited

Loading