Performance regression in nightly #34891

dragostis · 2016-07-17T19:58:07Z

Running cargo bench in pest on nightly-2016-07-02-x86_64-apple-darwin:

test data ... bench:      15,338 ns/iter (+/- 5,197)

On current nightly or nightly-2016-07-10-x86_64-apple-darwin:

test data ... bench:      28,605 ns/iter (+/- 7,901)

The text was updated successfully, but these errors were encountered:

TimNN · 2016-07-17T20:12:54Z

I can reproduce this (also on os x), just as another data point,nightly-2016-07-07 is still good

jonas-schievink · 2016-07-17T20:15:49Z

Can confirm on x86_64 Linux, ~~nightly-2016-07-12 is already bad~~ ^{why the hell did i test 12}

TimNN · 2016-07-17T20:25:09Z

Enabling LTO has no effect on 07-07 but 07-10 are now only ~2000ns worse.

TimNN · 2016-07-17T20:30:09Z

Also with optimisations completely disabled no performance difference is noticeable.

jonas-schievink · 2016-07-17T20:42:10Z

I think this happened between nightly-2016-07-08 and nightly-2016-07-09:

~/d/pest master> rustup run nightly-2016-07-08 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      10,872 ns/iter (+/- 823)

~/d/pest master> rustup run nightly-2016-07-09 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      17,845 ns/iter (+/- 1,472)

TimNN · 2016-07-17T20:42:31Z

So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not.

nagisa · 2016-07-17T21:12:19Z

Similar regression range to #34831.

As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration.

Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped.

jonas-schievink · 2016-07-17T21:16:30Z

I'm having a hard time using git, but #33890 seems to be in nightly-2016-07-09. I originally suspected #34728, but I don't think that landed in time. At least git isn't listing it in my search. 🐳

nagisa · 2016-07-17T21:21:15Z

That’s a viable candidate, yes. It is quite a big change in how we translate our functions.

Is codegen-units ≠ 1 used by anybody here?

cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course).

TimNN · 2016-07-17T21:26:33Z

@jonas-schievink: I came to the same conclusion (#33890 is the likeliest suspect, #34728 is not included in 2016-07-09).

@nagisa: I'm not setting codegen-units explicitly anywhere so I assume the cargo default of 1 is used.

nagisa · 2016-07-18T02:20:09Z

Yes, it seems like weak_odr prevents inlining opportunities. All of the closures ended up not being inlined due to having weak_odr – changing them to internal linkage promptly inlined all of them.

michaelwoerister · 2016-07-18T07:21:37Z

I haven't had time to look into this yet, but I'll do so shortly. If weak_odr is that bad for inlining, we should avoid it (note that there is #34830, which does so on MingW only, but could easily be adapted to do so on all platforms).

However, we had been using weak_odr for generics for months without anybody noticing that marked a performance drop and there is the internalize_symbols pass that switches symbols to internal linkage where possible (which should be the case for most closures).

michaelwoerister · 2016-07-18T09:14:31Z

I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that weak_odr linkage has something to do with the slowdown: The slower version contains only one weak_odr function -- and that isn't called anywhere.

michaelwoerister · 2016-07-18T09:23:22Z

I ignore that last comment, I was accidentally looking at the IR of the library, not of the benchmark-executable.

…=eddyb Run base::internalize_symbols() even for single-codegen-unit crates. The initial linkage-assignment (especially for closures) is a conservative one that makes some symbols more visible than they need to be. While this is not a correctness problem, it does force the LLVM inliner to be more conservative too, which results in poor performance. Once translation is based solely on MIR, it will be easier to also make the initial linkage assignment a better fitting one. Until then `internalize_symbols()` does a good job of preventing most performance regressions. This should solve the regressions reported in #34891 and maybe also those in #34831. As a side-effect, this will also solve most of the problematic cases described in #34793. Not reliably so, however. For that, we still need a solution like the one implement in #34830. cc @rust-lang/compiler

pmarcelll · 2016-07-26T13:52:29Z

It seems like the new nightly fixed this.

eddyb · 2016-07-26T14:56:54Z

@pmarcelll What nightly? https://static.rust-lang.org/dist/index.html doesn't list anything newer than 2016-07-21.

hanna-kruppe · 2016-07-26T15:00:19Z

That's strange. rustup update gave me a nightly with a commit-date of June 24 (9316ae5) and the dashboard picked up a new nightly as well.

eddyb · 2016-07-26T15:12:40Z

@rkruppe Ah, @alexcrichton informed me that it was a manual build and not enough was updated.

nagisa mentioned this issue Jul 17, 2016

Apparent 1000% regression in llvm phase of bootstrap #34831

Closed

sanxiyn added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jul 18, 2016

michaelwoerister mentioned this issue Jul 18, 2016

Run base::internalize_symbols() even for single-codegen-unit crates. #34899

Merged

TimNN mentioned this issue Jul 19, 2016

Fix wrong condition in base::internalize_symbols(). #34917

Merged

bstrie closed this as completed Jul 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in nightly #34891

Performance regression in nightly #34891

dragostis commented Jul 17, 2016

TimNN commented Jul 17, 2016

jonas-schievink commented Jul 17, 2016 •

edited

Loading

TimNN commented Jul 17, 2016

TimNN commented Jul 17, 2016

jonas-schievink commented Jul 17, 2016

TimNN commented Jul 17, 2016 •

edited

Loading

nagisa commented Jul 17, 2016 •

edited

Loading

jonas-schievink commented Jul 17, 2016

nagisa commented Jul 17, 2016

TimNN commented Jul 17, 2016

nagisa commented Jul 18, 2016 •

edited

Loading

michaelwoerister commented Jul 18, 2016

michaelwoerister commented Jul 18, 2016

michaelwoerister commented Jul 18, 2016

pmarcelll commented Jul 26, 2016

eddyb commented Jul 26, 2016

hanna-kruppe commented Jul 26, 2016 •

edited

Loading

eddyb commented Jul 26, 2016

Performance regression in nightly #34891

Performance regression in nightly #34891

Comments

dragostis commented Jul 17, 2016

TimNN commented Jul 17, 2016

jonas-schievink commented Jul 17, 2016 • edited Loading

TimNN commented Jul 17, 2016

TimNN commented Jul 17, 2016

jonas-schievink commented Jul 17, 2016

TimNN commented Jul 17, 2016 • edited Loading

nagisa commented Jul 17, 2016 • edited Loading

jonas-schievink commented Jul 17, 2016

nagisa commented Jul 17, 2016

TimNN commented Jul 17, 2016

nagisa commented Jul 18, 2016 • edited Loading

michaelwoerister commented Jul 18, 2016

michaelwoerister commented Jul 18, 2016

michaelwoerister commented Jul 18, 2016

pmarcelll commented Jul 26, 2016

eddyb commented Jul 26, 2016

hanna-kruppe commented Jul 26, 2016 • edited Loading

eddyb commented Jul 26, 2016

jonas-schievink commented Jul 17, 2016 •

edited

Loading

TimNN commented Jul 17, 2016 •

edited

Loading

nagisa commented Jul 17, 2016 •

edited

Loading

nagisa commented Jul 18, 2016 •

edited

Loading

hanna-kruppe commented Jul 26, 2016 •

edited

Loading