Inline start(::CartesianRange) #15775

mbauman · 2016-04-05T22:51:44Z

Or, alternatively: "Look ma! No CartesianRanges!"

This dramatically simplifies the generated code for iteration over CartesianRanges -- in fact, no references to CartesianRange appear in the LLVM IR in many cases with this commit. While it does simplify the code in #9080, it does not solve the performance problem there (I see no difference). It does, however, speed up copy(::SubArray) by 1.3 - 1.6x:

julia> A = sub(reshape(1:5^3,5,5,5), 1:2:5, :, 1:2:5);

julia> @benchmark copy!(similar(A), A) # current master
================ Benchmark Results ========================
     Time per evaluation: 232.69 ns [227.97 ns, 237.42 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 4301
   Number of evaluations: 120601
         R² of OLS model: 0.953
 Time spent benchmarking: 5.53 s

julia> @benchmark copy!(similar(A), A) # this PR
================ Benchmark Results ========================
     Time per evaluation: 168.91 ns [165.67 ns, 172.14 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 4601
   Number of evaluations: 160601
         R² of OLS model: 0.955
 Time spent benchmarking: 5.33 s

Comparing this to non-scalar indexing, you can see there's still room for improvement, even after this commit:

julia> @benchmark Base._unsafe_getindex!(similar(A), A.parent, A.indexes[1], A.indexes[2], A.indexes[3])
================ Benchmark Results ========================
     Time per evaluation: 115.75 ns [113.43 ns, 118.06 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 4501
   Number of evaluations: 146001
         R² of OLS model: 0.952
 Time spent benchmarking: 5.22 s

@benchmark

Or, alternatively: "Look ma! No CartesianRanges!" This dramatically simplifies the generated code for iteration over CartesianRanges -- in fact, no references to CartesianRange appear in the LLVM IR with this commit. While it does simplify the code in JuliaLang#9080, it does not solve the performance problem there (I see no difference). It does, however, speed up `copy(::SubArray)` by 1.3 - 1.6x: ```jl julia> A = sub(reshape(1:5^3,5,5,5), 1:2:5, :, 1:2:5); julia> @benchmark copy!(similar(A), A) # current master ================ Benchmark Results ======================== Time per evaluation: 232.69 ns [227.97 ns, 237.42 ns] Proportion of time in GC: 0.00% [0.00%, 0.00%] Memory allocated: 0.00 bytes Number of allocations: 0 allocations Number of samples: 4301 Number of evaluations: 120601 R² of OLS model: 0.953 Time spent benchmarking: 5.53 s julia> @benchmark copy!(similar(A), A) # this PR ================ Benchmark Results ======================== Time per evaluation: 168.91 ns [165.67 ns, 172.14 ns] Proportion of time in GC: 0.00% [0.00%, 0.00%] Memory allocated: 0.00 bytes Number of allocations: 0 allocations Number of samples: 4601 Number of evaluations: 160601 R² of OLS model: 0.955 Time spent benchmarking: 5.33 s ```

timholy · 2016-04-06T09:48:17Z

👍. Initially counterintuitive, since start should not by itself be terribly performance-sensitive, but of course inlining it gives LLVM the chance to analyze the whole block. Perhaps a reminder that it might make sense, if possible, to grant LLVM more authority to make some of its own inlining decisions above and beyond julia's.

vtjnash merged commit 95607ed into JuliaLang:master Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline start(::CartesianRange) #15775

Inline start(::CartesianRange) #15775

mbauman commented Apr 5, 2016

timholy commented Apr 6, 2016

Inline start(::CartesianRange) #15775

Inline start(::CartesianRange) #15775

Conversation

mbauman commented Apr 5, 2016

timholy commented Apr 6, 2016