RFC: Fast, lispy views #15071

mbauman · 2016-02-14T05:49:38Z

This started as a small experiment to see how much overhead there is when SubArrays simply wrap the passed indices… and rapidly snowballed into a complete solution. It's absolutely amazing how many tools are available now that allow this sort of simplification.

Use lispy definitions to re-index the parent indices instead of generated functions
The last parameter is now simply a boolean that specifies if the SubArray supports fast linear indexing
Merging indices (for linear indexing) will now create a vector of CartesianIndexes instead of a vector of Ints
first_index and stride1 are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.

This should be relatively minimally breaking; the most observable change is the change in meaning of the last parameter. I temporarily modified the array tests to test more permutations of SubArrays; here's the performance I saw:

runbenchmarks("array", vs = "JuliaLang/julia:master")
`

Also take the dimensionality of CartesianIndex into account when computing index lengths and shapes.

* Use lispy definitions to re-index the parent indices instead of generated functions * The last parameter is now simply a boolean that specifies if the SubArray supports fast linear indexing * Merging indices (for linear indexing) will now create a vector of CartesianIndexes instead of a vector of Ints * first_index and stride1 are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.

mbauman · 2016-02-14T05:51:45Z

cc @timholy

mbauman · 2016-02-14T05:52:55Z

test/perf/array/indexing.jl

    Bit = trues(sz)
-    (A, AF, AS, ASS, Asub, Bit,)
+    # (A, AF, AS, ASS, Asub, Bit,)
+    (Asub, Asub2, Asub3, Asub4, Asub5)


TODO: either restore this back to normal or fully incorporate these tests.

I've restored it back to the original for now. It adds quite a bit of time and complexity to these tests… we may eventually want to create a SubArray perf suite since there are so many different performance-sensitive permutations.

JeffBezanson · 2016-02-14T06:16:04Z

Awesome! Have not yet read in detail but looks like a significant simplification and net reduction in generated functions.

nanosoldier · 2016-02-14T07:29:47Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

timholy · 2016-02-14T17:22:30Z

base/multidimensional.jl

+(::Type{CartesianIndex})(index::Integer...) = CartesianIndex(index)
+(::Type{CartesianIndex{N}}){N}(index::Integer...) = CartesianIndex(index)
+# Allow passing tuples smaller than N
+@generated function (::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer})


This is busted:

julia> CartesianIndex{4}((2,2,2)) ERROR: MethodError: no method matching length(::Type{Tuple{Int64,Int64,Int64}}) in (::Vararg{Any})() at ./multidimensional.jl:27 in eval(::Module, ::Any) at ./boot.jl:267

Turns out we can get rid of the @generated function here too, see 1ed27d6 (added to #15030). Then this call would be

(::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer}) = CartesianIndex{N}(fill_to_length(index, 1, Val{N}))

This is also a more composable approach, since fill_to_length may be useful in a variety of contexts.

For reference:

julia> @code_llvm fill_to_length((1,2,3), -1, Val{5}) define void @julia_fill_to_length_23994([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 { top: %4 = call %jl_value_t*** @jl_get_ptls_states() %5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0 %6 = load i64, i64* %5, align 8 %7 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1 %8 = load i64, i64* %7, align 8 %9 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2 %10 = load i64, i64* %9, align 8 %11 = insertvalue [5 x i64] undef, i64 %6, 0 %12 = insertvalue [5 x i64] %11, i64 %8, 1 %13 = insertvalue [5 x i64] %12, i64 %10, 2 %14 = insertvalue [5 x i64] %13, i64 %2, 3 %15 = insertvalue [5 x i64] %14, i64 %2, 4 store [5 x i64] %15, [5 x i64]* %0, align 8 ret void }

compare against

julia> @generated function fill_to_length_gen{M,N}(t::NTuple{M}, val, ::Type{Val{N}}) M > N && error("input tuple has length $M, asked for $N") args = [d <= M ? :(t[$d]) : :(val) for d = 1:N] :(tuple($(args...))) end julia> @code_llvm fill_to_length_gen((1,2,3), -1, Val{5}) define void @julia_fill_to_length_gen_23846([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 { top: %4 = call %jl_value_t*** @jl_get_ptls_states() %5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0 %6 = load i64, i64* %5, align 8 %7 = insertvalue [5 x i64] undef, i64 %6, 0 %8 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1 %9 = load i64, i64* %8, align 8 %10 = insertvalue [5 x i64] %7, i64 %9, 1 %11 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2 %12 = load i64, i64* %11, align 8 %13 = insertvalue [5 x i64] %10, i64 %12, 2 %14 = insertvalue [5 x i64] %13, i64 %2, 3 %15 = insertvalue [5 x i64] %14, i64 %2, 4 store [5 x i64] %15, [5 x i64]* %0, align 8 ret void }

Though it's worse when you exceed 8 args 😦

Ah, good catch, thank you. I just made the immediate fix for now, but it really is remarkable how just a few generated "intrinsics" allow for more powerful generic code.

timholy · 2016-02-14T18:44:36Z

Especially if we can get rid of the @generated constructors altogether, I'm going to be really interested to compare the runtime of make test-subarray between master and this PR. (I'm betting you'll cut it by half at least.)

So very, very nice. Thanks for tackling this!

Make the stride1 and first index computations recursive

mbauman · 2016-02-15T13:04:24Z

(I'm betting you'll cut it by half at least)

There is a difference, but it's not quite that dramatic… about 40 seconds faster (5%) and 120MB less memory (13%).

timholy · 2016-02-15T13:23:50Z

Hmm, I was definitely hoping for more, but it's still very much in the right direction. Plus shorter, more maintainable, and (presumably) statically-compilable (should make @vtjnash happy).

LGTM. I'll leave it up to you to decide whether you want to wait for comments from others, but I am now fine with merging whenever you are.

tkelman · 2016-02-15T13:27:25Z

test/subarray.jl

-        end
-    end
-end
+# # Compare the linear indexing dimension of a SubArray


are these going to be useful again later, or should they just be deleted? it's in the git history and can be brought back from there if needed

Ah, I forgot about this chunk. Yes, it can be deleted — it's intrinsically tied to the meaning of the old LD parameter, which was pretty complicated.

No longer relevant after the meaning of SubArray's final type parameter has changed

tkelman · 2016-02-17T01:35:25Z

Merge? Any chance of this fixing the error that's being hit in #14991?

timholy · 2016-02-17T11:27:44Z

@tkelman, it should, because it incorporates this: #14529 (comment)

mbauman · 2016-02-17T14:18:48Z

I'm not quite as certain — your patch was incorporated into #14957, which had already been merged before that AppVeyor run. It's a non-deterministic failure in a test suite without randomness, right? Given that, I don't think this will fix it directly… but it might work around it.

Yes, this is good to merge. Let's give it a shot!

RFC: Fast, lispy views

mbauman added 2 commits February 14, 2016 00:39

Allow constructing CartesianIndex by flattening nested indexes

a12fba6

Also take the dimensionality of CartesianIndex into account when computing index lengths and shapes.

mbauman reviewed Feb 14, 2016
View reviewed changes

timholy reviewed Feb 14, 2016
View reviewed changes

mbauman changed the title ~~RFC: Fast, lispy views~~ WIP: Fast, lispy views Feb 14, 2016

Remove the last generated SubArray method

96aa518

Make the stride1 and first index computations recursive

mbauman force-pushed the mb/lispyviews branch from b522024 to 96aa518 Compare February 15, 2016 02:40

mbauman added 2 commits February 14, 2016 22:15

Fix and test CartesianIndex{N} constructors

24457e0

Restore array perf tests

5b817b9

mbauman changed the title ~~WIP: Fast, lispy views~~ RFC: Fast, lispy views Feb 15, 2016

tkelman reviewed Feb 15, 2016
View reviewed changes

Remove LD-specific test code

f64b802

No longer relevant after the meaning of SubArray's final type parameter has changed

mbauman added a commit that referenced this pull request Feb 17, 2016

Merge pull request #15071 from JuliaLang/mb/lispyviews

ed460e4

RFC: Fast, lispy views

mbauman merged commit ed460e4 into master Feb 17, 2016

mbauman deleted the mb/lispyviews branch February 17, 2016 14:19

mbauman mentioned this pull request Feb 17, 2016

Delete usage of inference in SubArray code #12409

Closed

This was referenced Apr 2, 2016

NullableArrays broken in v0.5 by 15071 JuliaStats/NullableArrays.jl#100

Closed

Fix issue #100, make work again with v0.5 JuliaStats/NullableArrays.jl#101

Merged

mbauman mentioned this pull request Apr 18, 2016

0.4.6pre Documentation for SubArray looks out of date #15931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Fast, lispy views #15071

RFC: Fast, lispy views #15071

mbauman commented Feb 14, 2016

mbauman commented Feb 14, 2016

mbauman Feb 14, 2016

mbauman Feb 15, 2016

JeffBezanson commented Feb 14, 2016

nanosoldier commented Feb 14, 2016

timholy Feb 14, 2016

timholy Feb 14, 2016

timholy Feb 14, 2016

mbauman Feb 15, 2016

timholy commented Feb 14, 2016

mbauman commented Feb 15, 2016

timholy commented Feb 15, 2016

tkelman Feb 15, 2016

mbauman Feb 15, 2016

tkelman commented Feb 17, 2016

timholy commented Feb 17, 2016

mbauman commented Feb 17, 2016

RFC: Fast, lispy views #15071

RFC: Fast, lispy views #15071

Conversation

mbauman commented Feb 14, 2016

mbauman commented Feb 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Feb 14, 2016

nanosoldier commented Feb 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy commented Feb 14, 2016

mbauman commented Feb 15, 2016

timholy commented Feb 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman commented Feb 17, 2016

timholy commented Feb 17, 2016

mbauman commented Feb 17, 2016