Benchmark new optimizer #26795

Keno · 2018-04-12T20:13:40Z

Opening this PR to have nanosoldier run over the new optimizer

TODO:

Top priority:

track inbounds/boundscheck elision through slot2ssa [ [NewOptimizer] implement boundscheck elision #26849]
splitting union callsites during inlining [[NewOptimizer] Union Split during Inlining #26900]
fix stmt_effect_free aggressiveness (typically, broken due to a failure to check for the possibility of control flow divergence – aka the error case) [[NewOptimizer] Make stmt_effect_free less aggressive #26948]
interpreter testing (esp. PhiC nodes)

Other fixes:

Keno · 2018-04-12T20:14:21Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-04-13T01:24:07Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno · 2018-04-13T01:45:37Z

When looking at the performance results it's important to note that bounds check elision is currently disabled. That's not too hard, so we should fix that and re-run. Looks like a good franction of these could be explained by that.

vtjnash

not for merging

JeffBezanson · 2018-04-20T18:23:30Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-04-20T18:49:31Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

ararslan · 2018-04-20T18:59:15Z

Weird, I don't see what the issue was; the logs appear to be incomplete.

Keno · 2018-05-02T10:37:32Z

@nanosoldier runbenchmarks(ALL, vs=":master")

Keno · 2018-05-02T10:40:06Z

Nanosoldier seems to be sleeping on the job here. Let's try this again. @ararslan could you look into this if it doesn't work:

@nanosoldier runbenchmarks(ALL, vs=":master")

quinnj · 2018-05-02T16:08:43Z

If one wanted to give the new optimizer a spin, is building this branch enough?

Keno · 2018-05-02T16:09:35Z

Yes, the only difference on this branch is toggling the flag though. All the code is already on master.

ararslan · 2018-05-02T17:52:22Z

Restarted the server. @nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-05-02T22:56:55Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.

Keno · 2018-05-03T19:08:33Z

Rebased to include #26969 while that is pending

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-05-04T00:10:01Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.

Keno · 2018-05-10T21:05:12Z

@nanosoldier runbenchmarks(ALL, vs=":master")

This PR now has everything except the main change in #26969, since @vtjnash doesn't like that one. Let's see where we're at.

Keno · 2018-05-12T22:51:29Z

Now that the new optimizer is on its way to being enabled, let's start benchmarking the iteration protocol change on top of it:

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-05-13T04:14:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno · 2018-05-13T05:01:26Z

Well, that is moderately disappointing. Looks like the getfield elim pass is powerful enough to capture the loop pattern at the moment. #26778 would do it, but I think I can simply adjust the lowering to produce a pattern that the less fancy getfield elim pass can handle.

Keno · 2018-05-15T02:41:23Z

This branch now has a rebased version of #26778 with the new iteration protocol. I need to clean things up and PR them separately, but should be good enough for a nanosoldier run.

@nanosoldier runbenchmarks(ALL, vs=":master")

Keno · 2018-05-15T02:49:09Z

I should point out that I didn't yet port mutable struct elision to the new SROA pass, so the thing to look for in those benchmark results is the things that did badly the last time around (e.g. perf_countnothing).

nanosoldier · 2018-05-15T07:54:31Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno · 2018-05-15T13:50:44Z

Hmm, I must have messed something up here. Those regressions are not there locally. Let me take a look.

Keno · 2018-05-15T14:31:43Z

Looks like because of the merge conflicts nanosoldier just dropped the last couple commits from this branch. Let me rebase and try again.

Keno · 2018-05-15T14:44:23Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-05-15T19:52:38Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno · 2018-05-15T20:05:56Z

Ok, that's fairly encouraging. I see three remaining classes of regressions

cat - Will need to look at what's going on here
parse(Int, x::String) - same
calling start/next/done explicitly - Now trigger deprecation warning, so that's expected. Should be switched to iterate once we merge kf/iterate.

This changes the iteration protocol from `start`/`next`/`done` to `iterate`. The new lowering of a for loop is as follows: ``` for x in itr ... end ``` becomes ``` next = iterate(itr) while next !== nothing x, state = next::Tuple{Any, Any} ... next = iterate(itr, state) end ``` The semantics are as apparent from the above lowering. `iterate` returns either `nothing` or a tuple of value and state. The state is passed to any subsequent operation. The first iteration is indicated, by not passing the second, state argument to the `iterate` method. Adaptors in both directions are provided to keep the legacy iteration protocol working for now. However, performance of the legacy iteration protocol will be severely pessimized. As an optional add-on for mutable iterators, a new `isdone` function is provided. This function is intended as an O(1) approximate query for iterator completion, where such a calculation is possible without mutation and/or is significantly faster than attempting to obtain the element itself. The function makes use of 3-value logic. `missing` is always an acceptable answer, in which case the caller should go ahead and attempt the iteration to obtain a definite result. If the result is not `missing`, it must be exact (i.e. if true, the next call to iterate must return `nothing`, if false it must not return nothing).

@noinline

The motivation is something like the following: ``` @noinline r_typeassert(c) = c ? (1,1) : nothing function f_typeassert(c) r(c)::Tuple end ``` Here, we know that the return type from r_typeassert is either `Tuple{Int, Int}` or `Nothing`, so all the type assert has to do is assert that it's the former. We can compute this by narrowing the type to be asserted using type intersection.

Keno · 2018-05-16T01:09:48Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-05-16T06:14:42Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno · 2018-05-16T17:21:20Z

Alright, I think that's close enough for us to merge this. We can continue working on the remaining couple small regressions through the 1.0 period. I'll go ahead and put up a PR (or two) with the rebased version of the SROA pass.

KristofferC · 2018-05-16T17:28:51Z

So do we know the reason for the rest of the regressions? Will fixing it be a case of wack a mole adding more optimization passes etc or is there a known failure case that fixing would fix everything?

Are stuff using the deprecated protocol directly in the benchmarks?

"Couple of small regressions" is being quite nice about it.

mbauman · 2018-05-16T17:45:13Z

The 200-600x regressions look to be hitting deprecations (explicitly calling start/next/done). I'd say just a couple of big guys left: 5x on reduce((x,y) -> x + 2y, 0, rand(10^3)), 5-10x to parse dates, 10x on raytracing and 50x on pop!(::Set).

Keno · 2018-05-16T18:00:28Z

Yes, what @mbauman, the real regressions are those three plus a couple 2x in the array code. I haven't look at them in detail, but I suspect the Date parsing regression is a missing type assertion somewhere. For the array code, there's a pattern where we know two iterators are of the same length which used to be done by skipping the done call and using @inbounds next to skip the extra branch (which by itself isn't expensive, but messes with the vectorizer). That obviously doesn't work anymore, but I plan to add an @unsafe_assume macro, to express those invariants (the two iterators being the same length) directly to LLVM.

ararslan added performance Must go faster compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Apr 14, 2018

vtjnash force-pushed the kf/benchmarknew branch from 9f23855 to 12e2760 Compare April 19, 2018 19:19

vtjnash self-requested a review April 19, 2018 19:20

vtjnash requested changes Apr 19, 2018

View reviewed changes

Keno force-pushed the kf/benchmarknew branch from 12e2760 to f64e968 Compare May 2, 2018 10:36

Keno mentioned this pull request May 3, 2018

[NewOptimizer] Better handling in the presence of select value #26969

Closed

Keno force-pushed the kf/benchmarknew branch from f64e968 to aa193a3 Compare May 3, 2018 19:07

Keno mentioned this pull request May 4, 2018

[NewOptimizer] Make simdloop marker more robust #26985

Merged

Keno mentioned this pull request May 5, 2018

WIP: Rebase #24478 #26990

Merged

Keno force-pushed the kf/benchmarknew branch from aa193a3 to 5f0393c Compare May 10, 2018 21:04

Keno force-pushed the kf/benchmarknew branch from 3ce5771 to 6b41bc2 Compare May 15, 2018 14:44

Keno force-pushed the kf/benchmarknew branch from 6b41bc2 to b2027dd Compare May 16, 2018 00:54

Keno and others added 8 commits May 15, 2018 21:07

Add patch for #27080

21235bc

WIP: The one SROA pass to rule them all

b90823e

assert_egal mechanism

3ba606d

WIP

d9c0e03

Re-enable BB simplification pass

a8fe2f0

WIP

3bbcd4a

Keno force-pushed the kf/benchmarknew branch from b2027dd to 3bbcd4a Compare May 16, 2018 01:09

Keno closed this May 23, 2018

ararslan deleted the kf/benchmarknew branch May 23, 2018 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark new optimizer #26795

Benchmark new optimizer #26795

Keno commented Apr 12, 2018 •

edited by vtjnash

Loading

Keno commented Apr 12, 2018

nanosoldier commented Apr 13, 2018

Keno commented Apr 13, 2018

vtjnash left a comment

JeffBezanson commented Apr 20, 2018

nanosoldier commented Apr 20, 2018

ararslan commented Apr 20, 2018

Keno commented May 2, 2018

Keno commented May 2, 2018

quinnj commented May 2, 2018

Keno commented May 2, 2018

ararslan commented May 2, 2018

nanosoldier commented May 2, 2018

Keno commented May 3, 2018

nanosoldier commented May 4, 2018

Keno commented May 10, 2018

Keno commented May 12, 2018

nanosoldier commented May 13, 2018

Keno commented May 13, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

nanosoldier commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

nanosoldier commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 16, 2018

nanosoldier commented May 16, 2018

Keno commented May 16, 2018

KristofferC commented May 16, 2018 •

edited

Loading

mbauman commented May 16, 2018

Keno commented May 16, 2018

Benchmark new optimizer #26795

Benchmark new optimizer #26795

Conversation

Keno commented Apr 12, 2018 • edited by vtjnash Loading

Keno commented Apr 12, 2018

nanosoldier commented Apr 13, 2018

Keno commented Apr 13, 2018

vtjnash left a comment

Choose a reason for hiding this comment

JeffBezanson commented Apr 20, 2018

nanosoldier commented Apr 20, 2018

ararslan commented Apr 20, 2018

Keno commented May 2, 2018

Keno commented May 2, 2018

quinnj commented May 2, 2018

Keno commented May 2, 2018

ararslan commented May 2, 2018

nanosoldier commented May 2, 2018

Keno commented May 3, 2018

nanosoldier commented May 4, 2018

Keno commented May 10, 2018

Keno commented May 12, 2018

nanosoldier commented May 13, 2018

Keno commented May 13, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

nanosoldier commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 15, 2018

nanosoldier commented May 15, 2018

Keno commented May 15, 2018

Keno commented May 16, 2018

nanosoldier commented May 16, 2018

Keno commented May 16, 2018

KristofferC commented May 16, 2018 • edited Loading

mbauman commented May 16, 2018

Keno commented May 16, 2018

Keno commented Apr 12, 2018 •

edited by vtjnash

Loading

KristofferC commented May 16, 2018 •

edited

Loading