-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in nightly #34891
Comments
I can reproduce this (also on os x), just as another data point, |
Can confirm on x86_64 Linux, |
Enabling LTO has no effect on |
Also with optimisations completely disabled no performance difference is noticeable. |
I think this happened between
|
So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not. |
Similar regression range to #34831. As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration. Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped. |
That’s a viable candidate, yes. It is quite a big change in how we translate our functions. Is codegen-units ≠ 1 used by anybody here? cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course). |
@jonas-schievink: I came to the same conclusion (#33890 is the likeliest suspect, #34728 is not included in @nagisa: I'm not setting |
Yes, it seems like |
I haven't had time to look into this yet, but I'll do so shortly. If However, we had been using |
I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that |
I ignore that last comment, I was accidentally looking at the IR of the library, not of the benchmark-executable. |
…=eddyb Run base::internalize_symbols() even for single-codegen-unit crates. The initial linkage-assignment (especially for closures) is a conservative one that makes some symbols more visible than they need to be. While this is not a correctness problem, it does force the LLVM inliner to be more conservative too, which results in poor performance. Once translation is based solely on MIR, it will be easier to also make the initial linkage assignment a better fitting one. Until then `internalize_symbols()` does a good job of preventing most performance regressions. This should solve the regressions reported in #34891 and maybe also those in #34831. As a side-effect, this will also solve most of the problematic cases described in #34793. Not reliably so, however. For that, we still need a solution like the one implement in #34830. cc @rust-lang/compiler
It seems like the new nightly fixed this. |
@pmarcelll What nightly? https://static.rust-lang.org/dist/index.html doesn't list anything newer than 2016-07-21. |
@rkruppe Ah, @alexcrichton informed me that it was a manual build and not enough was updated. |
Running
cargo bench
in pest onnightly-2016-07-02-x86_64-apple-darwin
:On current nightly or
nightly-2016-07-10-x86_64-apple-darwin
:The text was updated successfully, but these errors were encountered: