Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark new optimizer #26795

Closed
wants to merge 8 commits into from
Closed

Benchmark new optimizer #26795

wants to merge 8 commits into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 12, 2018

Opening this PR to have nanosoldier run over the new optimizer

TODO:

Top priority:

Other fixes:

@Keno
Copy link
Member Author

Keno commented Apr 12, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented Apr 13, 2018

When looking at the performance results it's important to note that bounds check elision is currently disabled. That's not too hard, so we should fix that and re-run. Looks like a good franction of these could be explained by that.

@ararslan ararslan added performance Must go faster compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Apr 14, 2018
@vtjnash vtjnash self-requested a review April 19, 2018 19:20
Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for merging

@JeffBezanson
Copy link
Sponsor Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@ararslan
Copy link
Member

Weird, I don't see what the issue was; the logs appear to be incomplete.

@Keno
Copy link
Member Author

Keno commented May 2, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@Keno
Copy link
Member Author

Keno commented May 2, 2018

Nanosoldier seems to be sleeping on the job here. Let's try this again. @ararslan could you look into this if it doesn't work:

@nanosoldier runbenchmarks(ALL, vs=":master")

@quinnj
Copy link
Member

quinnj commented May 2, 2018

If one wanted to give the new optimizer a spin, is building this branch enough?

@Keno
Copy link
Member Author

Keno commented May 2, 2018

Yes, the only difference on this branch is toggling the flag though. All the code is already on master.

@ararslan
Copy link
Member

ararslan commented May 2, 2018

Restarted the server. @nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno added a commit that referenced this pull request May 3, 2018
The benchmarks contain code like this:
```
x::Union{Nothing, Int}
result += ifelse(x === nothing, 0, x)
```
which, perhaps somewhat ironically is quite a bit harder
on the new optimizer than an equivalent code sequence
using ternary operators. The reason for this is that
ifelse gets inferred as `Union{Int, Nothing}`, creating
a phi node of that type, which then causes a union split +
that the optimizer can't really get rid of easily. What this
commit does is add some local improvements to help with the
situation. First, it adds some minimal back inference during
inlining. As a result, when inlining decides to unionsplit
`ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks
back at the definition of `x === nothing`, realizes it's constrained
by the union split and inserts the appropriate boolean constant.
Next, a new `type_tightening_pass` goes back and annotates more precise
types for the inlinined `select_value` and phi nodes. This is sufficient
to get the above code to behave reasonably and should hopefully fix
the performance regression on the various union sum benchmarks
seen in #26795.
@Keno
Copy link
Member Author

Keno commented May 3, 2018

Rebased to include #26969 while that is pending

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

Keno added a commit that referenced this pull request May 5, 2018
The benchmarks contain code like this:
```
x::Union{Nothing, Int}
result += ifelse(x === nothing, 0, x)
```
which, perhaps somewhat ironically is quite a bit harder
on the new optimizer than an equivalent code sequence
using ternary operators. The reason for this is that
ifelse gets inferred as `Union{Int, Nothing}`, creating
a phi node of that type, which then causes a union split +
that the optimizer can't really get rid of easily. What this
commit does is add some local improvements to help with the
situation. First, it adds some minimal back inference during
inlining. As a result, when inlining decides to unionsplit
`ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks
back at the definition of `x === nothing`, realizes it's constrained
by the union split and inserts the appropriate boolean constant.
Next, a new `type_tightening_pass` goes back and annotates more precise
types for the inlinined `select_value` and phi nodes. This is sufficient
to get the above code to behave reasonably and should hopefully fix
the performance regression on the various union sum benchmarks
seen in #26795.
@Keno Keno mentioned this pull request May 5, 2018
Keno added a commit that referenced this pull request May 5, 2018
The benchmarks contain code like this:
```
x::Union{Nothing, Int}
result += ifelse(x === nothing, 0, x)
```
which, perhaps somewhat ironically is quite a bit harder
on the new optimizer than an equivalent code sequence
using ternary operators. The reason for this is that
ifelse gets inferred as `Union{Int, Nothing}`, creating
a phi node of that type, which then causes a union split +
that the optimizer can't really get rid of easily. What this
commit does is add some local improvements to help with the
situation. First, it adds some minimal back inference during
inlining. As a result, when inlining decides to unionsplit
`ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks
back at the definition of `x === nothing`, realizes it's constrained
by the union split and inserts the appropriate boolean constant.
Next, a new `type_tightening_pass` goes back and annotates more precise
types for the inlinined `select_value` and phi nodes. This is sufficient
to get the above code to behave reasonably and should hopefully fix
the performance regression on the various union sum benchmarks
seen in #26795.
StefanKarpinski pushed a commit that referenced this pull request May 10, 2018
The benchmarks contain code like this:
```
x::Union{Nothing, Int}
result += ifelse(x === nothing, 0, x)
```
which, perhaps somewhat ironically is quite a bit harder
on the new optimizer than an equivalent code sequence
using ternary operators. The reason for this is that
ifelse gets inferred as `Union{Int, Nothing}`, creating
a phi node of that type, which then causes a union split +
that the optimizer can't really get rid of easily. What this
commit does is add some local improvements to help with the
situation. First, it adds some minimal back inference during
inlining. As a result, when inlining decides to unionsplit
`ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks
back at the definition of `x === nothing`, realizes it's constrained
by the union split and inserts the appropriate boolean constant.
Next, a new `type_tightening_pass` goes back and annotates more precise
types for the inlinined `select_value` and phi nodes. This is sufficient
to get the above code to behave reasonably and should hopefully fix
the performance regression on the various union sum benchmarks
seen in #26795.
@Keno
Copy link
Member Author

Keno commented May 10, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

This PR now has everything except the main change in #26969, since @vtjnash doesn't like that one. Let's see where we're at.

@Keno
Copy link
Member Author

Keno commented May 12, 2018

Now that the new optimizer is on its way to being enabled, let's start benchmarking the iteration protocol change on top of it:

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented May 13, 2018

Well, that is moderately disappointing. Looks like the getfield elim pass is powerful enough to capture the loop pattern at the moment. #26778 would do it, but I think I can simply adjust the lowering to produce a pattern that the less fancy getfield elim pass can handle.

@Keno
Copy link
Member Author

Keno commented May 15, 2018

This branch now has a rebased version of #26778 with the new iteration protocol. I need to clean things up and PR them separately, but should be good enough for a nanosoldier run.

@nanosoldier runbenchmarks(ALL, vs=":master")

@Keno
Copy link
Member Author

Keno commented May 15, 2018

I should point out that I didn't yet port mutable struct elision to the new SROA pass, so the thing to look for in those benchmark results is the things that did badly the last time around (e.g. perf_countnothing).

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented May 15, 2018

Hmm, I must have messed something up here. Those regressions are not there locally. Let me take a look.

@Keno
Copy link
Member Author

Keno commented May 15, 2018

Looks like because of the merge conflicts nanosoldier just dropped the last couple commits from this branch. Let me rebase and try again.

@Keno
Copy link
Member Author

Keno commented May 15, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented May 15, 2018

Ok, that's fairly encouraging. I see three remaining classes of regressions

  • cat - Will need to look at what's going on here
  • parse(Int, x::String) - same
  • calling start/next/done explicitly - Now trigger deprecation warning, so that's expected. Should be switched to iterate once we merge kf/iterate.

Keno and others added 8 commits May 15, 2018 21:07
This changes the iteration protocol from `start`/`next`/`done` to `iterate`.
The new lowering of a for loop is as follows:

```
for x in itr
    ...
end
```

becomes

```
next = iterate(itr)
while next !== nothing
    x, state = next::Tuple{Any, Any}
    ...
    next = iterate(itr, state)
end
```

The semantics are as apparent from the above lowering. `iterate` returns
either `nothing` or a tuple of value and state. The state is passed
to any subsequent operation. The first iteration is indicated, by not passing
the second, state argument to the `iterate` method.

Adaptors in both directions are provided to keep the legacy iteration
protocol working for now. However, performance of the legacy iteration
protocol will be severely pessimized.

As an optional add-on for mutable iterators, a new `isdone` function is
provided. This function is intended as an O(1) approximate query for
iterator completion, where such a calculation is possible without mutation
and/or is significantly faster than attempting to obtain the element itself.
The function makes use of 3-value logic. `missing` is always an acceptable
answer, in which case the caller should go ahead and attempt the iteration
to obtain a definite result. If the result is not `missing`, it must be
exact (i.e. if true, the next call to iterate must return `nothing`, if
false it must not return nothing).
The motivation is something like the following:
```
@noinline r_typeassert(c) = c ? (1,1) : nothing
function f_typeassert(c)
    r(c)::Tuple
end
```
Here, we know that the return type from r_typeassert is either
`Tuple{Int, Int}` or `Nothing`, so all the type assert has to
do is assert that it's the former. We can compute this by
narrowing the type to be asserted using type intersection.
@Keno
Copy link
Member Author

Keno commented May 16, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented May 16, 2018

Alright, I think that's close enough for us to merge this. We can continue working on the remaining couple small regressions through the 1.0 period. I'll go ahead and put up a PR (or two) with the rebased version of the SROA pass.

@KristofferC
Copy link
Sponsor Member

KristofferC commented May 16, 2018

So do we know the reason for the rest of the regressions? Will fixing it be a case of wack a mole adding more optimization passes etc or is there a known failure case that fixing would fix everything?

Are stuff using the deprecated protocol directly in the benchmarks?

"Couple of small regressions" is being quite nice about it.

@mbauman
Copy link
Sponsor Member

mbauman commented May 16, 2018

The 200-600x regressions look to be hitting deprecations (explicitly calling start/next/done). I'd say just a couple of big guys left: 5x on reduce((x,y) -> x + 2y, 0, rand(10^3)), 5-10x to parse dates, 10x on raytracing and 50x on pop!(::Set).

@Keno
Copy link
Member Author

Keno commented May 16, 2018

Yes, what @mbauman, the real regressions are those three plus a couple 2x in the array code. I haven't look at them in detail, but I suspect the Date parsing regression is a missing type assertion somewhere. For the array code, there's a pattern where we know two iterators are of the same length which used to be done by skipping the done call and using @inbounds next to skip the extra branch (which by itself isn't expensive, but messes with the vectorizer). That obviously doesn't work anymore, but I plan to add an @unsafe_assume macro, to express those invariants (the two iterators being the same length) directly to LLVM.

@Keno Keno closed this May 23, 2018
@ararslan ararslan deleted the kf/benchmarknew branch May 23, 2018 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants