Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: some modifications in cartesian indexing and iterating #9688

Merged
merged 3 commits into from
Jan 10, 2015

Conversation

Jutho
Copy link
Contributor

@Jutho Jutho commented Jan 9, 2015

The goal of this PR is to separate as much as possible the CartesianIndex type, and the corresponding multidimensional iterators, in such a way that the iterators have minimal dependence on the specific implementation of CartesianIndex. The underlying motivation is to easily allow the implementation of additional multidimensional iterators, either in Base or in packages.

  • In this PR, only CartesianIndex is abstract type, with concrete implementations CartesianIndex_N being dynamically generated with the eval in gen_cartesian.
  • The implementation of the old IndexIterator is now a concrete type that depends parametrically at the type of CartesianIndex used. This avoids the need to also generate it dynamically. I have also generalized the iterator, so that it can have an arbitrary starting point instead of (1,1,1,1,...). As a consequence, I also renamed the iterator type to CartesianRange, since this felt more appropriate. It really acts as a kind of range visiting all points in the multidimensional integer cuboid specified by the two corners extremal corners start and stop. I did not go as far as implement a colon method for this, but could easily do so.
  • There are some new functions, such as multiplying the CartesianIndex with an integer, getting its length, or getting the length of the CartesianRange iterator.
  • CartesianIndex now accepts Real arguments, which are passed to the to_index function, as these seems to be common for indexing in Base.
  • Ideally, there should be a simple way to create new CartesianIndex objects without allocation overhead and without having to call gen_cartesian to get the name/type of the actual implementation CartesianIndex_N. With call overloading, it should in principle be possible to built a constructor for the abstract type CartesianIndex without ever having to know about the concrete type CartesianIndex_N. Unfortunately, naive approaches currently fail because e.g. stagedfunction call(::Type{CartesianIndex},index::Real...) stops specialising on the length of index beyond N>7 (due to stagedfunction-related compilation error when type inference fails #8504).
    In fact, in several of the staged functions, such as the CartesianIndex arithmetic, the concrete type of the actual implementation can easily be obtained, as the input arguments are already of that type, and the staged function gets fed the types of the variables instead of the values. The only two functions where this or a similar trick would not work is eachindex{T,N}(A::AbstractArray{T,N}) and _start{T,N}(A::AbstractArray{T,N},::LinearSlow), as these don't already have a CartesianIndex argument. I tried this approach and it works and would allow to get rid of gen_cartesian in all methods except for these two.
  • However, I ended up trying a different approach all together that does not rely on calling the concrete type. Aside from the original constructor stagedfunction call{N}(::Type{CartesianIndex},index::NTuple{N,Real}) which generates the type, there is also a constructor call{N}(::Type{CartesianIndex{N}},index::Real...) where you specify N explicitly in the type, but don't have to input the arguments via a tuple. When this is first called, it dispatches to call(::Type{CartesianIndex},index::NTuple{N,Real}) (thus verifying if the number of arguments is correct). Aside from only generating the concrete type, however, the stagedfunction now also generates an additional constructor call(::Type{CartesianIndex{N}},i_1::Real,...,i_N::Real), for that specific N, i.e. without function parameters or varargs, and which just calls the corresponding constructor of the concrete type. The hope is that this would be the one that is then called instead on all subsequent calls of the form CartesianIndex{N}(i_1,...,i_N) as it is more specific as the general definition call{N}(::Type{CartesianIndex{N}},index::Real...) which has a function parameter and a varargs. I am not sure what the current status is of recompilation of dependent functions and how this relates to it. Thus it's a leap in the dark whether this would work, but my tests seem to indicate that this indeed works, or at least, that this allows to construct CartesianIndex objects without allocation overhead, even though I am not entirely sure about the code path that is taken.

In conclusion, with this PR, CartesianIndex is defined completely independently from any of the iterator constructors (maybe they could be in there own module). To the outside world, they just act as a normal immutable living on the stack and can be constructed as CartesianIndex{N}(i_1,...,i_N) or CartesianIndex((i_1,...,i_N)), where only the former guarantees no heap allocation.

However, it would be good if some additional performance testing could be done, as my tests were rather minimal and my tricks are somewhat based on guess work.
cc @timholy

gen_cartesian(1) # to make sure the next two lines are valid
next(R::StepRange, state::(Bool, CartesianIndex{1})) = R[state[2].I_1], (state[2].I_1==length(R), CartesianIndex_1(state[2].I_1+1))
next{T}(R::UnitRange{T}, state::(Bool, CartesianIndex{1})) = R[state[2].I_1], (state[2].I_1==length(R), CartesianIndex_1(state[2].I_1+1))
next(R::StepRange, state::(Bool, CartesianIndex{1})) = (index=state[2]; return R[index], (index[1]==length(R), CartesianIndex{1}(index[1]+1)))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've checked that the concrete type can be inferred properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do code_typed with a CartesianIndex{N} type specification, then it cannot infer that getfield(index,k) will return an Int, which then also does not allow to infer the return type exactly. However, if you first generate a specific CartesianIndex_N, and then use CartesianIndex_N as the type in code_typed, everything is inferred correctly. I can even make it infer correctly in the first case by adding a type assert ::Int in the getindex(index::CartesianIndex,i::Integer) method, and then replacing all explicit getfield calls with a getindex call. I will push another commit.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also where I wonder if the original implementation of these lines (including the call gen_cartesian(1)) is better. After all, we're only worried about the 1d case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I took the objective of expunging gen_cartesian calls to the extreme. However, both code_typed and actually calling this and timing it shows that the current version works as intended, i.e. no allocation overhead. However, it is even more important to keep in mind that these methods are only there to prevent warnings, they will never be called since ranges always have efficient linear indexing and will thus not be iterated over in a ::LinearSlow way.

The benefit of the current approach is that, if we ever have a different implementation of CartesianIndex, e.g. because staged types become available, or because it will one day be possible to properly align NTuple{N,Int} in an immutable, or there will be a built-in FixedVector type, there will be no need to change any of the iterator code. I guess this was my design goal.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@timholy
Copy link
Sponsor Member

timholy commented Jan 9, 2015

This is awesome. Thinking about it as a CartesianRange was also behind my initial name choice, but I like this better. Generalizing it to arbitrary bounds was on my list, too. Your implementation is really nice.

As far as checking performance goes, an easy start is to try the examples in #9080, both against hand-written indexing and (more importantly) against the implementation of cartesian indexing currently in master.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 9, 2015

I will check these examples tonight.

@tkelman
Copy link
Contributor

tkelman commented Jan 9, 2015

AppVeyor seems to have not triggered on your latest commit, though the previous one timed out on win64. cc @FeodorFitsner this is the second PR in a few days where this has happened.

@FeodorFitsner
Copy link

Could you check "Recent deliveries" for GitHub repo webhook please to see whether request was sent on that commit and what was the response from AppVeyor?

-Feodor

On Fri, Jan 9, 2015 at 6:16 AM, Tony Kelman notifications@github.com
wrote:

AppVeyor seems to have not triggered on your latest commit, though the previous one timed out on win64. cc @FeodorFitsner this is the second PR in a few days where this has happened.

Reply to this email directly or view it on GitHub:
#9688 (comment)

@tkelman
Copy link
Contributor

tkelman commented Jan 9, 2015

@FeodorFitsner see https://gist.github.com/tkelman/54064dab1df54d591e62 - the response body was empty

@FeodorFitsner
Copy link

Thanks, will take a look.

-Feodor

On Fri, Jan 9, 2015 at 8:53 AM, Tony Kelman notifications@github.com
wrote:

@FeodorFitsner see https://gist.github.com/tkelman/54064dab1df54d591e62 - the response body was empty

Reply to this email directly or view it on GitHub:
#9688 (comment)

@FeodorFitsner
Copy link

It was triggered, but not reported the status back to GitHub:
https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.1412

Will look into that.

@FeodorFitsner
Copy link

@tkelman PR sync event in that gist is referring commit 5827fbe (some modifications in cartesian indexing and iterating). Could you please paste webhook event corresponding to the second commit 3b3ad40 (add typeassert and use getindex consistently) (and what was response from AppVeyor as well)?

@Jutho
Copy link
Contributor Author

Jutho commented Jan 10, 2015

Ok, with the following test code

function mysum1(A)
   s=zero(eltype(A))
   for i=1:length(A)
       s+=A[i]
   end
   return s
end
function mysum2(A)
   s=zero(eltype(A))
   for i in eachindex(A)
       s+=A[i]
   end
   return s
end
function test(A,N)
    @time for i=1:N;mysum1(A);end
    @time for i=1:N;mysum2(A);end
    println("-------------")
end

I obtain on current master (with a precompilation step)

begin
    test(randn(ntuple(2,n->256)),100)
    test(randn(ntuple(4,n->16)),100)
    test(randn(ntuple(8,n->4)),100)
    test(randn(ntuple(16,n->2)),100)
end
elapsed time: 0.00723272 seconds (0 bytes allocated)
elapsed time: 0.010180632 seconds (0 bytes allocated)
-------------
elapsed time: 0.005622163 seconds (0 bytes allocated)
elapsed time: 0.02126161 seconds (4800 bytes allocated)
-------------
elapsed time: 0.005743825 seconds (0 bytes allocated)
elapsed time: 0.087427694 seconds (8000 bytes allocated)
-------------
elapsed time: 0.005648919 seconds (0 bytes allocated)
elapsed time: 0.23577718 seconds (14400 bytes allocated)
-------------

and with the current PR

elapsed time: 0.007284134 seconds (0 bytes allocated)
elapsed time: 0.010137212 seconds (0 bytes allocated)
-------------
elapsed time: 0.00566263 seconds (0 bytes allocated)
elapsed time: 0.021023731 seconds (0 bytes allocated)
-------------
elapsed time: 0.007784476 seconds (0 bytes allocated)
elapsed time: 0.989561927 seconds (0 bytes allocated)
-------------
elapsed time: 0.005665636 seconds (0 bytes allocated)
elapsed time: 1.284784037 seconds (0 bytes allocated)
-------------

So while there is actually less allocation, there does seem to be some serious performance degradation for the N=8 and N=16 case. I will look into this.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 10, 2015

So it seems to be that, even though the CartesianIndex{N} constructor can construct objects without allocation overhead, it does not get inlined, hence resulting in some performance degradation. The latest commit replaces the CartesianIndex{N} constructor at some critical places with a call to the direct constructors of the concrete types, which are always bound to the variable name I. This results in

elapsed time: 0.009983739 seconds (0 bytes allocated)
-------------
elapsed time: 0.005590681 seconds (0 bytes allocated)
elapsed time: 0.018013223 seconds (0 bytes allocated)
-------------
elapsed time: 0.005572602 seconds (0 bytes allocated)
elapsed time: 0.086959815 seconds (0 bytes allocated)
-------------
elapsed time: 0.00557165 seconds (0 bytes allocated)
elapsed time: 0.216865409 seconds (0 bytes allocated)
-------------

which seems to be at least as good as current master.

@tkelman
Copy link
Contributor

tkelman commented Jan 10, 2015

@FeodorFitsner I can't seem to find a recent delivery for that commit. There are 3 recent hook deliveries that when I try to expand the info, GitHub says "Sorry, something went wrong and we weren't able to fetch this delivery’s details." So I suspect this might be GitHub's fault.

@FeodorFitsner
Copy link

Hm, that's weird indeed. Though Travis uses their own "service" instead of generic webhooks - that explains why it checked that commit, but AV not.

Anyway, there should have been webhook delivery for "pull_request" sync event with 3b3ad40 "head" commit. :)

@timholy
Copy link
Sponsor Member

timholy commented Jan 10, 2015

Looks great to me, @Jutho. Merge at will.

Jutho added a commit that referenced this pull request Jan 10, 2015
RFC: some modifications in cartesian indexing and iterating
@Jutho Jutho merged commit df482d2 into JuliaLang:master Jan 10, 2015
@Jutho Jutho deleted the jh/cartesianrange branch January 10, 2015 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants