-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Generalize CartesianIndex and support ranges with static inner/outer size #37290
Conversation
…size Some array types, notably certain types of ReinterpretArray, can be thought of as having a first dimension that has an "inner/outer" pattern of iteration due to calling `divrem` on the first index. For efficient iteration and especially vectorization, it is helpful to avoid the `divrem`.
I've not had a chance to consider this deeply, but is this not just a symptom of a higher design choice? Why aren't these arrays actually N+1-dimensional? That's not a point against this — not in the least — but I've often felt that ReinterpretArrays should add a dimension in cases like this. If that's the case, do we need this to be such a general mechanism? |
Certainly in my application I'd prefer it to add a dimension, and indeed in ImageCore we define CC @Keno for additional perspective. |
The more I think about it, the more I like redefining |
Couldn't we add more special cases to the getindex that features the divrem? If some conditions are met, such as all of the following:
Then we could use an optimized implementation based on GC.@preserve A begin
unsafe_load(Base.unsafe_convert(Ptr{T}, A), 1 + sum(map(*, strides(A), map(-, inds, 1))))
end This would rely on constant propagation that one of the strides is You solution here is more general, as not all arrays that could benefit from SIMD will meet those three conditions.
I've recently taken to liking Static number types as a means to make a struct optionally dynamic or static. Supporting arithmetic means a lot of code can be written generically, and you could still dispatch on |
Interesting questions! The With regards to static numbers, yes, that's an interesting solution to making something that can be either dynamic or static. I like it! If we have a killer application that other bits of Base would take advantage of, that does seem like something we could move in. (But you'd really want there to be a good case that it's useful for things that need to be, or already are, in Base.) If it's useful for LV to move into the compiler, that could be the only application needed. |
Hopefully not too off-topic, but a place where static numbers seems like a natural fit in
Aside from the In effect, this gives systematic, as opposed to bespoke, names to these types. @Tokazama, author of https://github.com/Tokazama/StaticRanges.jl , might have more to add here. There's also a bit of discussion here: https://julialang.zulipchat.com/#narrow/stream/225583-appreciation/topic/run-time.20dispatch/near/195130969 |
The reference to
BTW. I've already started moving the more critical pieces of StaticRanges that support arrays into ArrayInterface, so wherever this ends up I'd be happy to help develop the corresponding range stuff. I should also mention that mixing static numbers with ordinary numbers doesn't always work. For example, if we called |
Some array types, notably certain types of
ReinterpretArray
, can be thought of as having a first dimension that has an "inner/outer" pattern of iteration due to callingdivrem
on the first index. For efficient iteration and especially vectorization, it is helpful to avoid thedivrem
. You can see this in the new benchmarks in #37277, where most cases are reasonably well fixed (to within a factor of 2 of their correspondingArray
s) but certain other array types still exhibit performance penalites of nearly 20x. On closer inspection, the ones that are "fixed" are conceptually likereinterpret(NTuple{K,T}, ::Array{T})
and those that are not are the opposite,reinterpret(T, ::Array{NTuple{K,T}})
. For the former, LLVM elides thedivrem
(it figures out thatsidx
is always zero and optimizes accordingly), but for the latter it cannot, and it seems that failure to eliminate thedivrem
kills vectorization. Conceptually, what you'd really like LLVM to do is aK
-fold unrolling of the loop, but it doesn't seem to discover that it should do that.This is a fairly scary PR that aims to fix that by eliminating the need for the
divrem
. The idea is to create a newInteger
subtype that stores anInt
as separatediv
andrem
pieces. For it to work with vectorization, the "inner" size (the denominator in thedivrem
) must be known to the type-system, so this is now a parametrized subtype of Integer which I'm callingSDivRemInt{K}
. (TheS
is for "static," presuming that someday someone might want a dynamic version that encodesK
as a field rather than a type parameter.) When looping, it uses aCartesianIndex{2}
-like iteration to increment theSDivRemInt
. To make this work smoothly, you need an AbstractUnitRange that supports this type of integer (including storing theK
as a type parameter), and then you need to generalizeCartesianIndices
to work with such objects. Altogether this is a fair amount of infrastructure just to change the inner iteration pattern, but I'm not sure I see an alternative that is clean and generic. Note that in the current state of the PR this is not yet "wired in" toReinterpretArray
, but I thought it might be important to put out the fundamental infrastructure first to see if this design makes sense or whether there are modifications that might increase applicability to other circumstances.I'm aware of a couple of test failures, but none of them look as scary as the simple fact of changing the definition of the
CartesianIndices
type, which has the chance to be fairly breaking. Consequently, I wanted to put this out there for comment before going any further with this. Would love feedback from @mbauman and/or @chriselrod.