RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

timholy · 2020-08-30T23:49:49Z

Some array types, notably certain types of ReinterpretArray, can be thought of as having a first dimension that has an "inner/outer" pattern of iteration due to calling divrem on the first index. For efficient iteration and especially vectorization, it is helpful to avoid the divrem. You can see this in the new benchmarks in #37277, where most cases are reasonably well fixed (to within a factor of 2 of their corresponding Arrays) but certain other array types still exhibit performance penalites of nearly 20x. On closer inspection, the ones that are "fixed" are conceptually like reinterpret(NTuple{K,T}, ::Array{T}) and those that are not are the opposite, reinterpret(T, ::Array{NTuple{K,T}}). For the former, LLVM elides the divrem (it figures out that sidx is always zero and optimizes accordingly), but for the latter it cannot, and it seems that failure to eliminate the divrem kills vectorization. Conceptually, what you'd really like LLVM to do is a K-fold unrolling of the loop, but it doesn't seem to discover that it should do that.

This is a fairly scary PR that aims to fix that by eliminating the need for the divrem. The idea is to create a new Integer subtype that stores an Int as separate div and rem pieces. For it to work with vectorization, the "inner" size (the denominator in the divrem) must be known to the type-system, so this is now a parametrized subtype of Integer which I'm calling SDivRemInt{K}. (The S is for "static," presuming that someday someone might want a dynamic version that encodes K as a field rather than a type parameter.) When looping, it uses a CartesianIndex{2}-like iteration to increment the SDivRemInt. To make this work smoothly, you need an AbstractUnitRange that supports this type of integer (including storing the K as a type parameter), and then you need to generalize CartesianIndices to work with such objects. Altogether this is a fair amount of infrastructure just to change the inner iteration pattern, but I'm not sure I see an alternative that is clean and generic. Note that in the current state of the PR this is not yet "wired in" to ReinterpretArray, but I thought it might be important to put out the fundamental infrastructure first to see if this design makes sense or whether there are modifications that might increase applicability to other circumstances.

I'm aware of a couple of test failures, but none of them look as scary as the simple fact of changing the definition of the CartesianIndices type, which has the chance to be fairly breaking. Consequently, I wanted to put this out there for comment before going any further with this. Would love feedback from @mbauman and/or @chriselrod.

…size Some array types, notably certain types of ReinterpretArray, can be thought of as having a first dimension that has an "inner/outer" pattern of iteration due to calling `divrem` on the first index. For efficient iteration and especially vectorization, it is helpful to avoid the `divrem`.

mbauman · 2020-08-31T20:50:41Z

I've not had a chance to consider this deeply, but is this not just a symptom of a higher design choice? Why aren't these arrays actually N+1-dimensional? That's not a point against this — not in the least — but I've often felt that ReinterpretArrays should add a dimension in cases like this.

If that's the case, do we need this to be such a general mechanism?

timholy · 2020-08-31T21:03:44Z

Certainly in my application I'd prefer it to add a dimension, and indeed in ImageCore we define reinterpetc that does precisely this. If that were a non-optional feature of ReinterpretArray then it would be known statically to the type system and LLVM would presumably unroll. I don't think we can change this now about ReinterpretArray, though. It's written in such a way as to even support cases where T and S do not differ in size by an integer multiple, which is quite cool but which obviously would nix the idea of an extra dimension for the smaller-eltype variant.

CC @Keno for additional perspective.

timholy · 2020-09-01T17:35:01Z

The more I think about it, the more I like redefining ReinterpretArray to include a final IsReshaped parameter that is Bool-valued. We could fix the performance for those arrays and not worry too much about the more general case. Maybe we could define reinterpret(T, A, reshape) to generate the reshaped version?

chriselrod · 2020-09-03T13:53:51Z

Couldn't we add more special cases to the getindex that features the divrem?

If some conditions are met, such as all of the following:

Base.allocatedinline(S).
The reinterpreted parent array is a StridedArray.
There is some integer K such that K*sizeof(T) == sizeof(S)

Then we could use an optimized implementation based on unsafe_load and Base.unsafe_convertin
a branch that looks specifically for cases like , where
In this case, we can can just do something like:

GC.@preserve A begin
    unsafe_load(Base.unsafe_convert(Ptr{T}, A), 1 + sum(map(*, strides(A), map(-, inds, 1))))
end

This would rely on constant propagation that one of the strides is 1 to get SIMD.

You solution here is more general, as not all arrays that could benefit from SIMD will meet those three conditions.
Is LLVM expected to (or have you confirmed) whether it can SIMD beyond the SIMD inner length? Ideally, it would be able to figure out that accesses where ind_start increments are still contiguous.

SDivRemInt{K}. (The S is for "static," presuming that someday someone might want a dynamic version that encodes K as a field rather than a type parameter.)

I've recently taken to liking Static number types as a means to make a struct optionally dynamic or static. Supporting arithmetic means a lot of code can be written generically, and you could still dispatch on DivRemInt{Int} vs DivRemInt{Static{K}} when necessary.
If this could be more broadly useful (e.g., if DivRemInt could benefit), would there be potential interest for something such as this in Base?

timholy · 2020-09-04T20:41:33Z

Interesting questions! The unsafe_load idea is a good one, although I suspect the solution in #37277 may compile down the same thing when the unsafe_load is applicable.

With regards to static numbers, yes, that's an interesting solution to making something that can be either dynamic or static. I like it! If we have a killer application that other bits of Base would take advantage of, that does seem like something we could move in. (But you'd really want there to be a good case that it's useful for things that need to be, or already are, in Base.) If it's useful for LV to move into the compiler, that could be the only application needed.

goretkin · 2020-09-09T00:51:58Z

If we have a killer application that other bits of Base would take advantage of, that does seem like something we could move in. (But you'd really want there to be a good case that it's useful for things that need to be, or already are, in Base.)

Hopefully not too off-topic, but a place where static numbers seems like a natural fit in Base is for unifying some of the <:AbstractRange types.

d is for dynamic and s is for static:

                            # start, step, stop
Base.StepRange(20, 1, 25)   # d, d, d
Base.UnitRange(20, 25)      # d, s=1, d
Base.OneTo(5)               # s=1, s=1, d

Aside from the start and stop fields both being ::T in StepRange(::T, ::S, ::T) where {T, S}, those three types (and 5 other combinations that are probably not common) could all be StepRange, with some fields being static numbers.

In effect, this gives systematic, as opposed to bespoke, names to these types. Base.OneTo{T} could be an alias for StepRange{StaticInt{1}, StaticInt{1}, T} (this is just a sketch).

@Tokazama, author of https://github.com/Tokazama/StaticRanges.jl , might have more to add here. There's also a bit of discussion here: https://julialang.zulipchat.com/#narrow/stream/225583-appreciation/topic/run-time.20dispatch/near/195130969

Tokazama · 2020-09-09T01:49:33Z

The reference to Static comes from this PR, in case you want some context. Using something like Static makes writing generic code in situations like this more a lot more concise and straight forward. I could see it becoming an involved PR, so it might make more sense to continue developing it within ArrayInterface. The only reasons I think it should be added to base now are:

If it will be widely used internally so just rip of the band-aid
If there's some compiler support that could smooth over some aspects of it

BTW. I've already started moving the more critical pieces of StaticRanges that support arrays into ArrayInterface, so wherever this ends up I'd be happy to help develop the corresponding range stuff.

I should also mention that mixing static numbers with ordinary numbers doesn't always work. For example, if we called StepRange(::Static, ::Int, ::Static) then the final value couldn't remain static because it could change based on the step size (e.g., 1:2:10 == 1:2:9 because the last number is changed based on the step). This doesn't mean it's not worth exploring, but it does mean sometimes users will be surprised by the results at times.

timholy requested a review from mbauman August 30, 2020 23:49

timholy mentioned this pull request Aug 31, 2020

Faster indexing with ReinterpretArrays #37277

Merged

This was referenced Sep 12, 2020

[RFC] Dropping support for non-RGB/Gray types in rand JuliaGraphics/ColorTypes.jl#150

Closed

Fix the performance of reinterpretarray with simultaneous reshaping #37559

Merged

timholy closed this in #37559 Sep 29, 2020

DilumAluthge deleted the teh/sdivremint branch March 25, 2021 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

timholy commented Aug 30, 2020 •

edited

Loading

mbauman commented Aug 31, 2020

timholy commented Aug 31, 2020

timholy commented Sep 1, 2020

chriselrod commented Sep 3, 2020

timholy commented Sep 4, 2020 •

edited

Loading

goretkin commented Sep 9, 2020 •

edited

Loading

Tokazama commented Sep 9, 2020 •

edited

Loading

RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

Conversation

timholy commented Aug 30, 2020 • edited Loading

mbauman commented Aug 31, 2020

timholy commented Aug 31, 2020

timholy commented Sep 1, 2020

chriselrod commented Sep 3, 2020

timholy commented Sep 4, 2020 • edited Loading

goretkin commented Sep 9, 2020 • edited Loading

Tokazama commented Sep 9, 2020 • edited Loading

timholy commented Aug 30, 2020 •

edited

Loading

timholy commented Sep 4, 2020 •

edited

Loading

goretkin commented Sep 9, 2020 •

edited

Loading

Tokazama commented Sep 9, 2020 •

edited

Loading