Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Fast, lispy views #15071

Merged
merged 6 commits into from
Feb 17, 2016
Merged

RFC: Fast, lispy views #15071

merged 6 commits into from
Feb 17, 2016

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Feb 14, 2016

This started as a small experiment to see how much overhead there is when SubArrays simply wrap the passed indices… and rapidly snowballed into a complete solution. It's absolutely amazing how many tools are available now that allow this sort of simplification.

  • Use lispy definitions to re-index the parent indices instead of generated functions
  • The last parameter is now simply a boolean that specifies if the SubArray supports fast linear indexing
  • Merging indices (for linear indexing) will now create a vector of CartesianIndexes instead of a vector of Ints
  • first_index and stride1 are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.

This should be relatively minimally breaking; the most observable change is the change in meaning of the last parameter. I temporarily modified the array tests to test more permutations of SubArrays; here's the performance I saw:

min_relative

runbenchmarks("array", vs = "JuliaLang/julia:master")
`

Also take the dimensionality of CartesianIndex into account when computing index lengths and shapes.
* Use lispy definitions to re-index the parent indices instead of generated functions
* The last parameter is now simply a boolean that specifies if the SubArray supports fast linear indexing
* Merging indices (for linear indexing) will now create a vector of CartesianIndexes instead of a vector of Ints
* first_index and stride1 are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.
@mbauman
Copy link
Member Author

mbauman commented Feb 14, 2016

cc @timholy

Bit = trues(sz)
(A, AF, AS, ASS, Asub, Bit,)
# (A, AF, AS, ASS, Asub, Bit,)
(Asub, Asub2, Asub3, Asub4, Asub5)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: either restore this back to normal or fully incorporate these tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored it back to the original for now. It adds quite a bit of time and complexity to these tests… we may eventually want to create a SubArray perf suite since there are so many different performance-sensitive permutations.

@JeffBezanson
Copy link
Member

Awesome! Have not yet read in detail but looks like a significant simplification and net reduction in generated functions.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

(::Type{CartesianIndex})(index::Integer...) = CartesianIndex(index)
(::Type{CartesianIndex{N}}){N}(index::Integer...) = CartesianIndex(index)
# Allow passing tuples smaller than N
@generated function (::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is busted:

julia> CartesianIndex{4}((2,2,2))
ERROR: MethodError: no method matching length(::Type{Tuple{Int64,Int64,Int64}})
 in (::Vararg{Any})() at ./multidimensional.jl:27
 in eval(::Module, ::Any) at ./boot.jl:267

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we can get rid of the @generated function here too, see 1ed27d6 (added to #15030). Then this call would be

(::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer}) = CartesianIndex{N}(fill_to_length(index, 1, Val{N}))

This is also a more composable approach, since fill_to_length may be useful in a variety of contexts.

For reference:

julia> @code_llvm fill_to_length((1,2,3), -1, Val{5})

define void @julia_fill_to_length_23994([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 {
top:
  %4 = call %jl_value_t*** @jl_get_ptls_states()
  %5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2
  %10 = load i64, i64* %9, align 8
  %11 = insertvalue [5 x i64] undef, i64 %6, 0
  %12 = insertvalue [5 x i64] %11, i64 %8, 1
  %13 = insertvalue [5 x i64] %12, i64 %10, 2
  %14 = insertvalue [5 x i64] %13, i64 %2, 3
  %15 = insertvalue [5 x i64] %14, i64 %2, 4
  store [5 x i64] %15, [5 x i64]* %0, align 8
  ret void
}

compare against

julia> @generated function fill_to_length_gen{M,N}(t::NTuple{M}, val, ::Type{Val{N}})
           M > N && error("input tuple has length $M, asked for $N")
           args = [d <= M ? :(t[$d]) : :(val) for d = 1:N]
           :(tuple($(args...)))
       end

julia> @code_llvm fill_to_length_gen((1,2,3), -1, Val{5})

define void @julia_fill_to_length_gen_23846([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 {
top:
  %4 = call %jl_value_t*** @jl_get_ptls_states()
  %5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0
  %6 = load i64, i64* %5, align 8
  %7 = insertvalue [5 x i64] undef, i64 %6, 0
  %8 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1
  %9 = load i64, i64* %8, align 8
  %10 = insertvalue [5 x i64] %7, i64 %9, 1
  %11 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2
  %12 = load i64, i64* %11, align 8
  %13 = insertvalue [5 x i64] %10, i64 %12, 2
  %14 = insertvalue [5 x i64] %13, i64 %2, 3
  %15 = insertvalue [5 x i64] %14, i64 %2, 4
  store [5 x i64] %15, [5 x i64]* %0, align 8
  ret void
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it's worse when you exceed 8 args 😦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, thank you. I just made the immediate fix for now, but it really is remarkable how just a few generated "intrinsics" allow for more powerful generic code.

@timholy
Copy link
Member

timholy commented Feb 14, 2016

Especially if we can get rid of the @generated constructors altogether, I'm going to be really interested to compare the runtime of make test-subarray between master and this PR. (I'm betting you'll cut it by half at least.)

So very, very nice. Thanks for tackling this!

@mbauman mbauman changed the title RFC: Fast, lispy views WIP: Fast, lispy views Feb 14, 2016
Make the stride1 and first index computations recursive
@mbauman mbauman changed the title WIP: Fast, lispy views RFC: Fast, lispy views Feb 15, 2016
@mbauman
Copy link
Member Author

mbauman commented Feb 15, 2016

(I'm betting you'll cut it by half at least)

There is a difference, but it's not quite that dramatic… about 40 seconds faster (5%) and 120MB less memory (13%).

@timholy
Copy link
Member

timholy commented Feb 15, 2016

Hmm, I was definitely hoping for more, but it's still very much in the right direction. Plus shorter, more maintainable, and (presumably) statically-compilable (should make @vtjnash happy).

LGTM. I'll leave it up to you to decide whether you want to wait for comments from others, but I am now fine with merging whenever you are.

end
end
end
# # Compare the linear indexing dimension of a SubArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these going to be useful again later, or should they just be deleted? it's in the git history and can be brought back from there if needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot about this chunk. Yes, it can be deleted — it's intrinsically tied to the meaning of the old LD parameter, which was pretty complicated.

No longer relevant after the meaning of SubArray's final type parameter has changed
@tkelman
Copy link
Contributor

tkelman commented Feb 17, 2016

Merge? Any chance of this fixing the error that's being hit in #14991?

@timholy
Copy link
Member

timholy commented Feb 17, 2016

@tkelman, it should, because it incorporates this: #14529 (comment)

@mbauman
Copy link
Member Author

mbauman commented Feb 17, 2016

I'm not quite as certain — your patch was incorporated into #14957, which had already been merged before that AppVeyor run. It's a non-deterministic failure in a test suite without randomness, right? Given that, I don't think this will fix it directly… but it might work around it.

Yes, this is good to merge. Let's give it a shot!

mbauman added a commit that referenced this pull request Feb 17, 2016
@mbauman mbauman merged commit ed460e4 into master Feb 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants