-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark new optimizer #26795
Benchmark new optimizer #26795
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
When looking at the performance results it's important to note that bounds check elision is currently disabled. That's not too hard, so we should fix that and re-run. Looks like a good franction of these could be explained by that. |
9f23855
to
12e2760
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for merging
@nanosoldier |
Weird, I don't see what the issue was; the logs appear to be incomplete. |
@nanosoldier |
Nanosoldier seems to be sleeping on the job here. Let's try this again. @ararslan could you look into this if it doesn't work: @nanosoldier |
If one wanted to give the new optimizer a spin, is building this branch enough? |
Yes, the only difference on this branch is toggling the flag though. All the code is already on master. |
Restarted the server. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.
Rebased to include #26969 while that is pending @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.
The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.
The benchmarks contain code like this: ``` x::Union{Nothing, Int} result += ifelse(x === nothing, 0, x) ``` which, perhaps somewhat ironically is quite a bit harder on the new optimizer than an equivalent code sequence using ternary operators. The reason for this is that ifelse gets inferred as `Union{Int, Nothing}`, creating a phi node of that type, which then causes a union split + that the optimizer can't really get rid of easily. What this commit does is add some local improvements to help with the situation. First, it adds some minimal back inference during inlining. As a result, when inlining decides to unionsplit `ifelse(x === nothing, 0, x::Union{Nothing, Int})`, it looks back at the definition of `x === nothing`, realizes it's constrained by the union split and inserts the appropriate boolean constant. Next, a new `type_tightening_pass` goes back and annotates more precise types for the inlinined `select_value` and phi nodes. This is sufficient to get the above code to behave reasonably and should hopefully fix the performance regression on the various union sum benchmarks seen in #26795.
@nanosoldier This PR now has everything except the main change in #26969, since @vtjnash doesn't like that one. Let's see where we're at. |
Now that the new optimizer is on its way to being enabled, let's start benchmarking the iteration protocol change on top of it: @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Well, that is moderately disappointing. Looks like the getfield elim pass is powerful enough to capture the loop pattern at the moment. #26778 would do it, but I think I can simply adjust the lowering to produce a pattern that the less fancy getfield elim pass can handle. |
This branch now has a rebased version of #26778 with the new iteration protocol. I need to clean things up and PR them separately, but should be good enough for a nanosoldier run. @nanosoldier |
I should point out that I didn't yet port mutable struct elision to the new SROA pass, so the thing to look for in those benchmark results is the things that did badly the last time around (e.g. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Hmm, I must have messed something up here. Those regressions are not there locally. Let me take a look. |
Looks like because of the merge conflicts nanosoldier just dropped the last couple commits from this branch. Let me rebase and try again. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Ok, that's fairly encouraging. I see three remaining classes of regressions
|
This changes the iteration protocol from `start`/`next`/`done` to `iterate`. The new lowering of a for loop is as follows: ``` for x in itr ... end ``` becomes ``` next = iterate(itr) while next !== nothing x, state = next::Tuple{Any, Any} ... next = iterate(itr, state) end ``` The semantics are as apparent from the above lowering. `iterate` returns either `nothing` or a tuple of value and state. The state is passed to any subsequent operation. The first iteration is indicated, by not passing the second, state argument to the `iterate` method. Adaptors in both directions are provided to keep the legacy iteration protocol working for now. However, performance of the legacy iteration protocol will be severely pessimized. As an optional add-on for mutable iterators, a new `isdone` function is provided. This function is intended as an O(1) approximate query for iterator completion, where such a calculation is possible without mutation and/or is significantly faster than attempting to obtain the element itself. The function makes use of 3-value logic. `missing` is always an acceptable answer, in which case the caller should go ahead and attempt the iteration to obtain a definite result. If the result is not `missing`, it must be exact (i.e. if true, the next call to iterate must return `nothing`, if false it must not return nothing).
The motivation is something like the following: ``` @noinline r_typeassert(c) = c ? (1,1) : nothing function f_typeassert(c) r(c)::Tuple end ``` Here, we know that the return type from r_typeassert is either `Tuple{Int, Int}` or `Nothing`, so all the type assert has to do is assert that it's the former. We can compute this by narrowing the type to be asserted using type intersection.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Alright, I think that's close enough for us to merge this. We can continue working on the remaining couple small regressions through the 1.0 period. I'll go ahead and put up a PR (or two) with the rebased version of the SROA pass. |
So do we know the reason for the rest of the regressions? Will fixing it be a case of wack a mole adding more optimization passes etc or is there a known failure case that fixing would fix everything? Are stuff using the deprecated protocol directly in the benchmarks? "Couple of small regressions" is being quite nice about it. |
The 200-600x regressions look to be hitting deprecations (explicitly calling start/next/done). I'd say just a couple of big guys left: 5x on |
Yes, what @mbauman, the real regressions are those three plus a couple 2x in the array code. I haven't look at them in detail, but I suspect the Date parsing regression is a missing type assertion somewhere. For the array code, there's a pattern where we know two iterators are of the same length which used to be done by skipping the |
Opening this PR to have nanosoldier run over the new optimizer
TODO:
Top priority:
Other fixes:
function sizeof_typeref(); g = Core.sizeof; return g; end
)isa Union{...}
pattern (or eliminate uses) – this is used quite heavily in the new passes, and so contributes quite visibly to performance in profile runs_apply
elision ([NewOptimizer] Fix _apply elision #26821)