Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAX_TUPLETYPE_LEN and cartesian arrayref performance (DON'T MERGE) #5393

Closed
wants to merge 1 commit into from
Closed

Conversation

timholy
Copy link
Member

@timholy timholy commented Jan 14, 2014

(This is not really a PR, it's a bug/issue report disguised as a PR to make it easier to talk about and for others to do their own testing.)

Thanks to some great work by Jeff, julia's built-in arrayref---which implements both linear and cartesian indexing of arrays---now has amazing performance: in most cases you can't distinguish its performance between cartesian and linear indexing, even in cases where that linear index is computed efficiently because the order of access follows a pattern. I've illustrated that by running test/arrayperf in two situations: one against current master which uses linear indexing to implement getindex for arrays, and the one in this PR in which the linear-indexing code is deleted and getindex falls back to the implementation for AbstractArrays (which uses cartesian indexing). The complete results for getindex are posted in this gist.

There are a few oddities (noise due to garbage-collection?), but to me the overall pattern suggests we don't need linear indexing. But there's one consistent exception, illustrated for this example but common to all of the tests:

Slicing with contiguous blocks (cartesian case):
Small arrays:
1 dimensions (2500000 repeats, 10000000 operations): elapsed time: 0.625764791 seconds (299991840 bytes allocated)
2 dimensions (2500000 repeats, 10000000 operations): elapsed time: 0.97220765 seconds (420500120 bytes allocated)
3 dimensions (625000 repeats, 10000000 operations): elapsed time: 1.096706901 seconds (419835400 bytes allocated)
4 dimensions (625000 repeats, 10000000 operations): elapsed time: 1.048247011 seconds (431500132 bytes allocated)
5 dimensions (156250 repeats, 10000000 operations): elapsed time: 0.467072104 seconds (207361056 bytes allocated)
6 dimensions (156250 repeats, 10000000 operations): elapsed time: 0.496604407 seconds (210810456 bytes allocated)
7 dimensions (39063 repeats, 10000128 operations): elapsed time: 0.3508091 seconds (121212592 bytes allocated)
8 dimensions (39063 repeats, 10000128 operations): elapsed time: 2.864820923 seconds (1322590144 bytes allocated)
9 dimensions (9766 repeats, 10000384 operations): elapsed time: 3.00026783 seconds (1377019176 bytes allocated)
10 dimensions (9766 repeats, 10000384 operations): elapsed time: 3.102930591 seconds (1457233652 bytes allocated)

Notice that the performance of arrayref falls off a cliff at 8 dimensions. Is this something that can be fixed? Or is this really hard?

@simonster
Copy link
Member

I think performance drops because MAX_TUPLETYPE_LEN in inference.jl is 8, which appears to prevent getindex from getting inlined when it is passed >8 arguments.

@timholy
Copy link
Member Author

timholy commented Jan 15, 2014

Good bet, @simonster, thanks.

@lindahua
Copy link
Contributor

Actually, 0.97 sec for 2-dimension vs 0.62 sec for 1-dimension are still quite noticeable in some sense.

@lindahua
Copy link
Contributor

I did a benchmark specifically to test the performance of linear indexing vs cartesian indexing. The code is here on gist.

Here are the results I got (I run it multiple times, the results are quite consistent):

Scanning matrix of size (8,8) for 10000000 times:
1D:  0.36637 sec
2D:  0.43078 sec
Scanning matrix of size (4,4,4) for 10000000 times:
1D:  0.36851 sec
3D:  0.48609 sec
Scanning matrix of size (4,4,2,2) for 10000000 times:
1D:  0.36736 sec
4D:  0.52755 sec

The difference is still obviously noticeable (but not very significant though).

I think a reasonable guideline here is that:

  • When using cartesian indexing leads to considerably more concise & generic codes, cartesian indexing is definitely a reasonable choice. Except for the most performance critical cases, it is not recommended to manually unroll cartesian indexing.
  • When linear indexing is not difficult to implement (e.g. working with contiguous array/array blocks), linear indexing is still preferable.

@timholy
Copy link
Member Author

timholy commented Jan 16, 2014

That's very interesting.

Actually, 0.97 sec for 2-dimension vs 0.62 sec for 1-dimension are still quite noticeable in some sense.

Well, the actual arrays were different, not just the indexing pattern. What I intended was to compare the two files in that gist against each other (not very convenient, I know, but I was being lazy and leveraging a script I wrote long ago and put in test/). In particular, that test also includes allocation of the output, not just traversal.

Your test is more focused on just traversal, which is of course extremely useful. While I get slightly different numbers on my laptop, they're mostly consistent with your results (except I don't really see a difference between 1D and 2D):

Scanning matrix of size (8,8) for 10000000 times:
1D:  0.51942 sec
2D:  0.52862 sec
Scanning matrix of size (4,4,4) for 10000000 times:
1D:  0.51715 sec
3D:  0.71273 sec
Scanning matrix of size (4,4,2,2) for 10000000 times:
1D:  0.51848 sec
4D:  0.82086 sec

If I make the array too big to fit in L1 cache, so that repeated traversal will generate cache misses, here's what I get:

Scanning matrix of size (512,512) for 2441 times:
1D:  0.44576 sec
2D:  0.45418 sec
Scanning matrix of size (64,64,64) for 2441 times:
1D:  0.44576 sec
3D:  0.53878 sec
Scanning matrix of size (16,32,16,32) for 2441 times:
1D:  0.44502 sec
4D:  0.50972 sec

Not very dramatic, but still noticeable.

So you are right, there is a difference that grows with higher dimension; it makes total sense, which is why I was surprised.

It seems very reasonable to have algorithms on arrays use linear indexing when it's easy, and use cartesian indexing for AbstractArray implementations. There are also some algorithms, like pairwise summation, that would probably be hard to implement using cartesian indexing.

@jiahao jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06
@simonster
Copy link
Member

I believe this is related to @vtjnash's comment here. I managed to hit this in real code trying to broadcast a 7 dimensional array and a vector.

@JeffBezanson
Copy link
Member

Closing this in favor of other issues/prs related to cartesian stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants