Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290

Closed
wants to merge 1 commit into from

Conversation

timholy
Copy link
Member

@timholy timholy commented Aug 30, 2020

Some array types, notably certain types of ReinterpretArray, can be thought of as having a first dimension that has an "inner/outer" pattern of iteration due to calling divrem on the first index. For efficient iteration and especially vectorization, it is helpful to avoid the divrem. You can see this in the new benchmarks in #37277, where most cases are reasonably well fixed (to within a factor of 2 of their corresponding Arrays) but certain other array types still exhibit performance penalites of nearly 20x. On closer inspection, the ones that are "fixed" are conceptually like reinterpret(NTuple{K,T}, ::Array{T}) and those that are not are the opposite, reinterpret(T, ::Array{NTuple{K,T}}). For the former, LLVM elides the divrem (it figures out that sidx is always zero and optimizes accordingly), but for the latter it cannot, and it seems that failure to eliminate the divrem kills vectorization. Conceptually, what you'd really like LLVM to do is a K-fold unrolling of the loop, but it doesn't seem to discover that it should do that.

This is a fairly scary PR that aims to fix that by eliminating the need for the divrem. The idea is to create a new Integer subtype that stores an Int as separate div and rem pieces. For it to work with vectorization, the "inner" size (the denominator in the divrem) must be known to the type-system, so this is now a parametrized subtype of Integer which I'm calling SDivRemInt{K}. (The S is for "static," presuming that someday someone might want a dynamic version that encodes K as a field rather than a type parameter.) When looping, it uses a CartesianIndex{2}-like iteration to increment the SDivRemInt. To make this work smoothly, you need an AbstractUnitRange that supports this type of integer (including storing the K as a type parameter), and then you need to generalize CartesianIndices to work with such objects. Altogether this is a fair amount of infrastructure just to change the inner iteration pattern, but I'm not sure I see an alternative that is clean and generic. Note that in the current state of the PR this is not yet "wired in" to ReinterpretArray, but I thought it might be important to put out the fundamental infrastructure first to see if this design makes sense or whether there are modifications that might increase applicability to other circumstances.

I'm aware of a couple of test failures, but none of them look as scary as the simple fact of changing the definition of the CartesianIndices type, which has the chance to be fairly breaking. Consequently, I wanted to put this out there for comment before going any further with this. Would love feedback from @mbauman and/or @chriselrod.

…size

Some array types, notably certain types of ReinterpretArray,
can be thought of as having a first dimension that has an "inner/outer"
pattern of iteration due to calling `divrem` on the first index.
For efficient iteration and especially vectorization, it is helpful
to avoid the `divrem`.
@mbauman
Copy link
Member

mbauman commented Aug 31, 2020

I've not had a chance to consider this deeply, but is this not just a symptom of a higher design choice? Why aren't these arrays actually N+1-dimensional? That's not a point against this — not in the least — but I've often felt that ReinterpretArrays should add a dimension in cases like this.

If that's the case, do we need this to be such a general mechanism?

@timholy
Copy link
Member Author

timholy commented Aug 31, 2020

Certainly in my application I'd prefer it to add a dimension, and indeed in ImageCore we define reinterpetc that does precisely this. If that were a non-optional feature of ReinterpretArray then it would be known statically to the type system and LLVM would presumably unroll. I don't think we can change this now about ReinterpretArray, though. It's written in such a way as to even support cases where T and S do not differ in size by an integer multiple, which is quite cool but which obviously would nix the idea of an extra dimension for the smaller-eltype variant.

CC @Keno for additional perspective.

@timholy
Copy link
Member Author

timholy commented Sep 1, 2020

The more I think about it, the more I like redefining ReinterpretArray to include a final IsReshaped parameter that is Bool-valued. We could fix the performance for those arrays and not worry too much about the more general case. Maybe we could define reinterpret(T, A, reshape) to generate the reshaped version?

@chriselrod
Copy link
Contributor

Couldn't we add more special cases to the getindex that features the divrem?

If some conditions are met, such as all of the following:

  1. Base.allocatedinline(S).
  2. The reinterpreted parent array is a StridedArray.
  3. There is some integer K such that K*sizeof(T) == sizeof(S)

Then we could use an optimized implementation based on unsafe_load and Base.unsafe_convertin
a branch that looks specifically for cases like , where
In this case, we can can just do something like:

GC.@preserve A begin
    unsafe_load(Base.unsafe_convert(Ptr{T}, A), 1 + sum(map(*, strides(A), map(-, inds, 1))))
end

This would rely on constant propagation that one of the strides is 1 to get SIMD.

You solution here is more general, as not all arrays that could benefit from SIMD will meet those three conditions.
Is LLVM expected to (or have you confirmed) whether it can SIMD beyond the SIMD inner length? Ideally, it would be able to figure out that accesses where ind_start increments are still contiguous.

SDivRemInt{K}. (The S is for "static," presuming that someday someone might want a dynamic version that encodes K as a field rather than a type parameter.)

I've recently taken to liking Static number types as a means to make a struct optionally dynamic or static. Supporting arithmetic means a lot of code can be written generically, and you could still dispatch on DivRemInt{Int} vs DivRemInt{Static{K}} when necessary.
If this could be more broadly useful (e.g., if DivRemInt could benefit), would there be potential interest for something such as this in Base?

@timholy
Copy link
Member Author

timholy commented Sep 4, 2020

Interesting questions! The unsafe_load idea is a good one, although I suspect the solution in #37277 may compile down the same thing when the unsafe_load is applicable.

With regards to static numbers, yes, that's an interesting solution to making something that can be either dynamic or static. I like it! If we have a killer application that other bits of Base would take advantage of, that does seem like something we could move in. (But you'd really want there to be a good case that it's useful for things that need to be, or already are, in Base.) If it's useful for LV to move into the compiler, that could be the only application needed.

@goretkin
Copy link
Contributor

goretkin commented Sep 9, 2020

If we have a killer application that other bits of Base would take advantage of, that does seem like something we could move in. (But you'd really want there to be a good case that it's useful for things that need to be, or already are, in Base.)

Hopefully not too off-topic, but a place where static numbers seems like a natural fit in Base is for unifying some of the <:AbstractRange types.

d is for dynamic and s is for static:

                            # start, step, stop
Base.StepRange(20, 1, 25)   # d, d, d
Base.UnitRange(20, 25)      # d, s=1, d
Base.OneTo(5)               # s=1, s=1, d

Aside from the start and stop fields both being ::T in StepRange(::T, ::S, ::T) where {T, S}, those three types (and 5 other combinations that are probably not common) could all be StepRange, with some fields being static numbers.

In effect, this gives systematic, as opposed to bespoke, names to these types. Base.OneTo{T} could be an alias for StepRange{StaticInt{1}, StaticInt{1}, T} (this is just a sketch).

@Tokazama, author of https://github.com/Tokazama/StaticRanges.jl , might have more to add here. There's also a bit of discussion here: https://julialang.zulipchat.com/#narrow/stream/225583-appreciation/topic/run-time.20dispatch/near/195130969

@Tokazama
Copy link
Contributor

Tokazama commented Sep 9, 2020

The reference to Static comes from this PR, in case you want some context. Using something like Static makes writing generic code in situations like this more a lot more concise and straight forward. I could see it becoming an involved PR, so it might make more sense to continue developing it within ArrayInterface. The only reasons I think it should be added to base now are:

  1. If it will be widely used internally so just rip of the band-aid
  2. If there's some compiler support that could smooth over some aspects of it

BTW. I've already started moving the more critical pieces of StaticRanges that support arrays into ArrayInterface, so wherever this ends up I'd be happy to help develop the corresponding range stuff.

I should also mention that mixing static numbers with ordinary numbers doesn't always work. For example, if we called StepRange(::Static, ::Int, ::Static) then the final value couldn't remain static because it could change based on the step size (e.g., 1:2:10 == 1:2:9 because the last number is changed based on the step). This doesn't mean it's not worth exploring, but it does mean sometimes users will be surprised by the results at times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants