-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Fast, lispy views #15071
RFC: Fast, lispy views #15071
Conversation
Also take the dimensionality of CartesianIndex into account when computing index lengths and shapes.
* Use lispy definitions to re-index the parent indices instead of generated functions * The last parameter is now simply a boolean that specifies if the SubArray supports fast linear indexing * Merging indices (for linear indexing) will now create a vector of CartesianIndexes instead of a vector of Ints * first_index and stride1 are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.
cc @timholy |
Bit = trues(sz) | ||
(A, AF, AS, ASS, Asub, Bit,) | ||
# (A, AF, AS, ASS, Asub, Bit,) | ||
(Asub, Asub2, Asub3, Asub4, Asub5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: either restore this back to normal or fully incorporate these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've restored it back to the original for now. It adds quite a bit of time and complexity to these tests… we may eventually want to create a SubArray perf suite since there are so many different performance-sensitive permutations.
Awesome! Have not yet read in detail but looks like a significant simplification and net reduction in generated functions. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
(::Type{CartesianIndex})(index::Integer...) = CartesianIndex(index) | ||
(::Type{CartesianIndex{N}}){N}(index::Integer...) = CartesianIndex(index) | ||
# Allow passing tuples smaller than N | ||
@generated function (::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is busted:
julia> CartesianIndex{4}((2,2,2))
ERROR: MethodError: no method matching length(::Type{Tuple{Int64,Int64,Int64}})
in (::Vararg{Any})() at ./multidimensional.jl:27
in eval(::Module, ::Any) at ./boot.jl:267
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we can get rid of the @generated
function here too, see 1ed27d6 (added to #15030). Then this call would be
(::Type{CartesianIndex{N}}){N,M}(index::NTuple{M,Integer}) = CartesianIndex{N}(fill_to_length(index, 1, Val{N}))
This is also a more composable approach, since fill_to_length
may be useful in a variety of contexts.
For reference:
julia> @code_llvm fill_to_length((1,2,3), -1, Val{5})
define void @julia_fill_to_length_23994([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 {
top:
%4 = call %jl_value_t*** @jl_get_ptls_states()
%5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0
%6 = load i64, i64* %5, align 8
%7 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1
%8 = load i64, i64* %7, align 8
%9 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2
%10 = load i64, i64* %9, align 8
%11 = insertvalue [5 x i64] undef, i64 %6, 0
%12 = insertvalue [5 x i64] %11, i64 %8, 1
%13 = insertvalue [5 x i64] %12, i64 %10, 2
%14 = insertvalue [5 x i64] %13, i64 %2, 3
%15 = insertvalue [5 x i64] %14, i64 %2, 4
store [5 x i64] %15, [5 x i64]* %0, align 8
ret void
}
compare against
julia> @generated function fill_to_length_gen{M,N}(t::NTuple{M}, val, ::Type{Val{N}})
M > N && error("input tuple has length $M, asked for $N")
args = [d <= M ? :(t[$d]) : :(val) for d = 1:N]
:(tuple($(args...)))
end
julia> @code_llvm fill_to_length_gen((1,2,3), -1, Val{5})
define void @julia_fill_to_length_gen_23846([5 x i64]* sret, [3 x i64]*, i64, %jl_value_t*) #0 {
top:
%4 = call %jl_value_t*** @jl_get_ptls_states()
%5 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 0
%6 = load i64, i64* %5, align 8
%7 = insertvalue [5 x i64] undef, i64 %6, 0
%8 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 1
%9 = load i64, i64* %8, align 8
%10 = insertvalue [5 x i64] %7, i64 %9, 1
%11 = getelementptr inbounds [3 x i64], [3 x i64]* %1, i64 0, i64 2
%12 = load i64, i64* %11, align 8
%13 = insertvalue [5 x i64] %10, i64 %12, 2
%14 = insertvalue [5 x i64] %13, i64 %2, 3
%15 = insertvalue [5 x i64] %14, i64 %2, 4
store [5 x i64] %15, [5 x i64]* %0, align 8
ret void
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it's worse when you exceed 8 args 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, thank you. I just made the immediate fix for now, but it really is remarkable how just a few generated "intrinsics" allow for more powerful generic code.
Especially if we can get rid of the So very, very nice. Thanks for tackling this! |
Make the stride1 and first index computations recursive
b522024
to
96aa518
Compare
There is a difference, but it's not quite that dramatic… about 40 seconds faster (5%) and 120MB less memory (13%). |
Hmm, I was definitely hoping for more, but it's still very much in the right direction. Plus shorter, more maintainable, and (presumably) statically-compilable (should make @vtjnash happy). LGTM. I'll leave it up to you to decide whether you want to wait for comments from others, but I am now fine with merging whenever you are. |
end | ||
end | ||
end | ||
# # Compare the linear indexing dimension of a SubArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these going to be useful again later, or should they just be deleted? it's in the git history and can be brought back from there if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot about this chunk. Yes, it can be deleted — it's intrinsically tied to the meaning of the old LD
parameter, which was pretty complicated.
No longer relevant after the meaning of SubArray's final type parameter has changed
Merge? Any chance of this fixing the error that's being hit in #14991? |
@tkelman, it should, because it incorporates this: #14529 (comment) |
I'm not quite as certain — your patch was incorporated into #14957, which had already been merged before that AppVeyor run. It's a non-deterministic failure in a test suite without randomness, right? Given that, I don't think this will fix it directly… but it might work around it. Yes, this is good to merge. Let's give it a shot! |
This started as a small experiment to see how much overhead there is when SubArrays simply wrap the passed indices… and rapidly snowballed into a complete solution. It's absolutely amazing how many tools are available now that allow this sort of simplification.
first_index
andstride1
are now only computed for FastLinear SubArrays; SubArrays that wrap LinearSlow parents now do no extra work on construction.This should be relatively minimally breaking; the most observable change is the change in meaning of the last parameter. I temporarily modified the array tests to test more permutations of SubArrays; here's the performance I saw:
runbenchmarks("array", vs = "JuliaLang/julia:master")
`