-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: some modifications in cartesian indexing and iterating #9688
Conversation
gen_cartesian(1) # to make sure the next two lines are valid | ||
next(R::StepRange, state::(Bool, CartesianIndex{1})) = R[state[2].I_1], (state[2].I_1==length(R), CartesianIndex_1(state[2].I_1+1)) | ||
next{T}(R::UnitRange{T}, state::(Bool, CartesianIndex{1})) = R[state[2].I_1], (state[2].I_1==length(R), CartesianIndex_1(state[2].I_1+1)) | ||
next(R::StepRange, state::(Bool, CartesianIndex{1})) = (index=state[2]; return R[index], (index[1]==length(R), CartesianIndex{1}(index[1]+1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've checked that the concrete type can be inferred properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do code_typed
with a CartesianIndex{N}
type specification, then it cannot infer that getfield(index,k)
will return an Int
, which then also does not allow to infer the return type exactly. However, if you first generate a specific CartesianIndex_N
, and then use CartesianIndex_N
as the type in code_typed
, everything is inferred correctly. I can even make it infer correctly in the first case by adding a type assert ::Int
in the getindex(index::CartesianIndex,i::Integer)
method, and then replacing all explicit getfield
calls with a getindex
call. I will push another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also where I wonder if the original implementation of these lines (including the call gen_cartesian(1)
) is better. After all, we're only worried about the 1d case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I took the objective of expunging gen_cartesian
calls to the extreme. However, both code_typed
and actually calling this and timing it shows that the current version works as intended, i.e. no allocation overhead. However, it is even more important to keep in mind that these methods are only there to prevent warnings, they will never be called since ranges always have efficient linear indexing and will thus not be iterated over in a ::LinearSlow
way.
The benefit of the current approach is that, if we ever have a different implementation of CartesianIndex
, e.g. because staged types become available, or because it will one day be possible to properly align NTuple{N,Int}
in an immutable, or there will be a built-in FixedVector type, there will be no need to change any of the iterator code. I guess this was my design goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
This is awesome. Thinking about it as a As far as checking performance goes, an easy start is to try the examples in #9080, both against hand-written indexing and (more importantly) against the implementation of cartesian indexing currently in master. |
I will check these examples tonight. |
AppVeyor seems to have not triggered on your latest commit, though the previous one timed out on win64. cc @FeodorFitsner this is the second PR in a few days where this has happened. |
Could you check "Recent deliveries" for GitHub repo webhook please to see whether request was sent on that commit and what was the response from AppVeyor? -Feodor On Fri, Jan 9, 2015 at 6:16 AM, Tony Kelman notifications@github.com
|
@FeodorFitsner see https://gist.github.com/tkelman/54064dab1df54d591e62 - the response body was empty |
Thanks, will take a look. -Feodor On Fri, Jan 9, 2015 at 8:53 AM, Tony Kelman notifications@github.com
|
It was triggered, but not reported the status back to GitHub: Will look into that. |
Ok, with the following test code function mysum1(A)
s=zero(eltype(A))
for i=1:length(A)
s+=A[i]
end
return s
end
function mysum2(A)
s=zero(eltype(A))
for i in eachindex(A)
s+=A[i]
end
return s
end
function test(A,N)
@time for i=1:N;mysum1(A);end
@time for i=1:N;mysum2(A);end
println("-------------")
end I obtain on current master (with a precompilation step) begin
test(randn(ntuple(2,n->256)),100)
test(randn(ntuple(4,n->16)),100)
test(randn(ntuple(8,n->4)),100)
test(randn(ntuple(16,n->2)),100)
end
elapsed time: 0.00723272 seconds (0 bytes allocated)
elapsed time: 0.010180632 seconds (0 bytes allocated)
-------------
elapsed time: 0.005622163 seconds (0 bytes allocated)
elapsed time: 0.02126161 seconds (4800 bytes allocated)
-------------
elapsed time: 0.005743825 seconds (0 bytes allocated)
elapsed time: 0.087427694 seconds (8000 bytes allocated)
-------------
elapsed time: 0.005648919 seconds (0 bytes allocated)
elapsed time: 0.23577718 seconds (14400 bytes allocated)
------------- and with the current PR elapsed time: 0.007284134 seconds (0 bytes allocated)
elapsed time: 0.010137212 seconds (0 bytes allocated)
-------------
elapsed time: 0.00566263 seconds (0 bytes allocated)
elapsed time: 0.021023731 seconds (0 bytes allocated)
-------------
elapsed time: 0.007784476 seconds (0 bytes allocated)
elapsed time: 0.989561927 seconds (0 bytes allocated)
-------------
elapsed time: 0.005665636 seconds (0 bytes allocated)
elapsed time: 1.284784037 seconds (0 bytes allocated)
------------- So while there is actually less allocation, there does seem to be some serious performance degradation for the |
So it seems to be that, even though the
which seems to be at least as good as current master. |
@FeodorFitsner I can't seem to find a recent delivery for that commit. There are 3 recent hook deliveries that when I try to expand the info, GitHub says "Sorry, something went wrong and we weren't able to fetch this delivery’s details." So I suspect this might be GitHub's fault. |
Hm, that's weird indeed. Though Travis uses their own "service" instead of generic webhooks - that explains why it checked that commit, but AV not. Anyway, there should have been webhook delivery for "pull_request" sync event with 3b3ad40 "head" commit. :) |
Looks great to me, @Jutho. Merge at will. |
RFC: some modifications in cartesian indexing and iterating
The goal of this PR is to separate as much as possible the
CartesianIndex
type, and the corresponding multidimensional iterators, in such a way that the iterators have minimal dependence on the specific implementation ofCartesianIndex
. The underlying motivation is to easily allow the implementation of additional multidimensional iterators, either in Base or in packages.CartesianIndex
is abstract type, with concrete implementationsCartesianIndex_N
being dynamically generated with theeval
ingen_cartesian
.IndexIterator
is now a concrete type that depends parametrically at the type ofCartesianIndex
used. This avoids the need to also generate it dynamically. I have also generalized the iterator, so that it can have an arbitrary starting point instead of (1,1,1,1,...). As a consequence, I also renamed the iterator type toCartesianRange
, since this felt more appropriate. It really acts as a kind of range visiting all points in the multidimensional integer cuboid specified by the two corners extremal cornersstart
andstop
. I did not go as far as implement acolon
method for this, but could easily do so.CartesianIndex
with an integer, getting its length, or getting the length of theCartesianRange
iterator.Real
arguments, which are passed to theto_index
function, as these seems to be common for indexing in Base.CartesianIndex
objects without allocation overhead and without having to callgen_cartesian
to get the name/type of the actual implementationCartesianIndex_N
. Withcall
overloading, it should in principle be possible to built a constructor for the abstract typeCartesianIndex
without ever having to know about the concrete typeCartesianIndex_N
. Unfortunately, naive approaches currently fail because e.g.stagedfunction call(::Type{CartesianIndex},index::Real...)
stops specialising on the length ofindex
beyondN>7
(due to stagedfunction-related compilation error when type inference fails #8504).In fact, in several of the staged functions, such as the CartesianIndex arithmetic, the concrete type of the actual implementation can easily be obtained, as the input arguments are already of that type, and the staged function gets fed the types of the variables instead of the values. The only two functions where this or a similar trick would not work is
eachindex{T,N}(A::AbstractArray{T,N})
and_start{T,N}(A::AbstractArray{T,N},::LinearSlow)
, as these don't already have aCartesianIndex
argument. I tried this approach and it works and would allow to get rid ofgen_cartesian
in all methods except for these two.stagedfunction call{N}(::Type{CartesianIndex},index::NTuple{N,Real})
which generates the type, there is also a constructorcall{N}(::Type{CartesianIndex{N}},index::Real...)
where you specifyN
explicitly in the type, but don't have to input the arguments via a tuple. When this is first called, it dispatches tocall(::Type{CartesianIndex},index::NTuple{N,Real})
(thus verifying if the number of arguments is correct). Aside from only generating the concrete type, however, thestagedfunction
now also generates an additional constructorcall(::Type{CartesianIndex{N}},i_1::Real,...,i_N::Real)
, for that specificN
, i.e. without function parameters or varargs, and which just calls the corresponding constructor of the concrete type. The hope is that this would be the one that is then called instead on all subsequent calls of the formCartesianIndex{N}(i_1,...,i_N)
as it is more specific as the general definitioncall{N}(::Type{CartesianIndex{N}},index::Real...)
which has a function parameter and a varargs. I am not sure what the current status is of recompilation of dependent functions and how this relates to it. Thus it's a leap in the dark whether this would work, but my tests seem to indicate that this indeed works, or at least, that this allows to constructCartesianIndex
objects without allocation overhead, even though I am not entirely sure about the code path that is taken.In conclusion, with this PR,
CartesianIndex
is defined completely independently from any of the iterator constructors (maybe they could be in there own module). To the outside world, they just act as a normal immutable living on the stack and can be constructed asCartesianIndex{N}(i_1,...,i_N)
orCartesianIndex((i_1,...,i_N))
, where only the former guarantees no heap allocation.However, it would be good if some additional performance testing could be done, as my tests were rather minimal and my tricks are somewhat based on guess work.
cc @timholy