-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
circshift(SArray) returns MArray #745
Comments
Agreed. The reason for this is that it's dispatching to Base and |
I looked into this as I wanted to learn a bit more about Julia and this library, but, oh boy, this was (to a novice like me) a surprisingly difficult problem. So, unless one goes for an approach with a mutable temporary, you'd have to end up with something along the lines of; circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
return quote
@inbounds return typeof(a)(tuple($(exprs...)))
end
end So, given the forced marriage of stack allocation and immutability in Julia, I have started to think of this more philosophically. Basically, the need for this a function circshift(a::StaticArray{Tuple{L}}, ::Length{N}) where {L, N}
vcat(_slice(Length(L-N+1), Length(L), a), _slice(Length(1), Length(L-N), a))
end
@generated function _slice(::Length{L}, ::Length{U}, a::StaticArray{<:Tuple,T}) where {L, U, T}
exprs = [:(a[$i]) for i = L:U]
return quote
@inbounds return SArray{Tuple{1+U-L},T,1,1+U-L}(tuple($(exprs...)))
end
end which, of course, is super-fast if one specifies a static shift |
Haha! Well thanks for jumping right in I think you're on the right track here :-)
Yes I think that's basically the right way to implement it. However it might not be so bad as you think: the compiler is quite good at constant propagation, so in the cases where @inline Base.circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
return quote
$(Expr(:meta, :inline))
@inbounds return typeof(a)(tuple($(exprs...)))
end
end Note the appearance of Thus we get extremely simple native code when the julia> foo(a) = circshift(a, length(a)÷2)
foo (generic function with 1 method)
julia> foo(SA[1,2,3,4,5])
5-element SArray{Tuple{5},Int64,1,5} with indices SOneTo(5):
4
5
1
2
3
julia> @code_native foo(SA[1,2,3,4,5])
.text
; ┌ @ REPL[11]:1 within `foo'
movq %rdi, %rax
; │┌ @ REPL[9]:1 within `circshift'
; ││┌ @ REPL[3]:2 within `_circshift'
; │││┌ @ REPL[3]:5 within `macro expansion'
vmovups 24(%rsi), %xmm0
vmovups (%rsi), %xmm1
movq 16(%rsi), %rcx
; │└└└
vmovups %xmm0, (%rdi)
vmovups %xmm1, 16(%rdi)
movq %rcx, 32(%rdi)
retq
nop
; └
Luckily this isn't quite true :-) The compiler will commonly allocate |
Yeah I didn't manage to get it to inline properly, so i saw very poor performance. Though, the tricky case is that performance is somewhat poor (as arrays get a bit larger) when the calls to Experimenting a bit @inline function circshift(v::SVector{L}, shift::Integer) where L
out = similar(v)
shift = mod(shift, L)
cut = L-shift
@inbounds for i in 1:(L-cut)
out[i] = v[i+cut]
end
@inbounds for i in 1:cut
out[i+shift] = v[i]
end
return SVector(out)
end So, this version seems to be sufficiently optimizer friendly, with the added benefit of not having to mess about with generated functions (a plus in I.M.H.O.) and the fact that it's fast in cases where shift isn't known at compile time. |
Yes it's always nice to avoid
You're not using @inline function circshift(v::StaticVector, shift::Integer)
w = similar(v)
L = length(v)
shift = mod(shift, L)
cut = L-shift
@inbounds for i in 1:(L-cut)
w[i] = v[i+cut]
end
@inbounds for i in 1:cut
w[i+shift] = v[i]
end
return similar_type(v)(w)
end |
Really nice - so that does the end up with good code these days? Do we end up with a bunch of extra move instructions on the last line or is it ellided entirely? What happens for big numbers and other mutable things that might not work well inside an @c42f seeing this I feel like we are edging closer to making this package simpler or even redundant - I mean that function is basically good for |
There's a WIP PR to Base (JuliaLang/julia#34126) which puts pointers inline, so that could maybe help fix
Yes agreed things are getting simpler! I think the larger way forward is to start using |
Unfortunately this doesn't affect all objects (structs with union types and unallocated fields may be treated differently).
Yeah! Let's do it :) I'd still be tempted to create a seperate package with |
I'm thinking to do this for a while for structs and tuples. It's super easy to implement this with Setfield.jl: jw3126/Setfield.jl#56 (comment) |
So... I got overexcited. I think I need this for Dictionaries.jl anyways. |
You beat me to it! I hope I'm not slowing down your excitement but I think |
Haha! Yes we definitely need “shape frozen” vs “value frozen” semantics and I meant that If you are going to think about these things remember that append-only datasets are a thing and that we might want eg a vector that supports For dictionaries I was thinking |
Note that it is possible to write something like function matmul(a::SMatrix{I, J}, b::SMatrix{J, K}) where {I, J, K}
c = zero(MMatrix{I, K})
@inbounds for k in 1:K, j in 1:J
b_jk = b[j, k]
@simd for i in 1:I
c[i, k] = muladd(a[i, j], b_jk, c[i, k])
end
end
return SMatrix(c)
end now, without causing allocations. It seems LLVM only unrolls the innermost loop now for some reason though. |
(I opened andyferris/Freeze.jl#1 to avoid derailing this issue too much.) |
So I started experimenting further: https://github.com/andyferris/StaticArraysLite.jl So far the basic functionality is quite competitive to StaticArrays for |
I need to use Back then I just reported the lack of this functionality and didn't think about the solution. Base.circshift(v::SVector{N,T}, shift::Integer) where {N,T} = SVector(ntuple(k->v[mod1(k-shift,N)], Val(N)))
Base.circshift(v::SVector{N,T}, ::Val{S}) where {N,T,S} = SVector(ntuple(k->v[mod1(k-S,N)], Val(N))) Here are the benchmark results: julia> VERSION
v"1.7.2-pre.0"
julia> v = SVector(ntuple(identity, 10))
10-element SVector{10, Int64} with indices SOneTo(10):
1
2
3
4
5
6
7
8
9
10
julia> @btime circshift($v, 3);
31.051 ns (0 allocations: 0 bytes)
julia> @btime circshift($v, Val(3));
2.641 ns (0 allocations: 0 bytes) |
Example:
I think the result should be
SArray
.The text was updated successfully, but these errors were encountered: