-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic PGO support. #48346
Add basic PGO support. #48346
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
May need some cleanup and such, I don't know the policy for new codegen options and such. Also, I've only tested on Linux so far, though it should work everywhere AFAICT. |
I think this should mostly be ready for review, and sanity-checking the general approach, but I want to test it more during the week and such. cc #1220 |
I have no idea about the Travis failures, does travis not use the same LLVM version as the repo submodule? Some nice results with the same test program as https://github.com/Geal/pgo-rust with
(which is pretty ridiculous IMO, though it's the best-case of course). There's some broken stuff, like pgo-gen not being respected with opt-level (only |
I think the numbers above aren't representative, since the |
Thanks for looking into this, @emilio! Experimental features like this usually start out with a |
Does it support autoFDO ? It's the secret magic that allow Clear Linux to be 40% Faster than others Linux distros : https://www.phoronix.com/scan.php?page=article&item=ubuntu-clear-eoy2017&num=2 https://clearlinux.org/features/automatic-feedback-directed-optimizer HS: I know you are a Firefox Dev @emilio don't you think you could use this for freely significatively improving Firefox/servo performance ? |
src/librustc/session/config.rs
Outdated
if cg.pgo_gen.is_some() && !cg.pgo_use.is_empty() { | ||
early_error( | ||
error_format, | ||
"options `-C pgo-gen` and `-C pgo-use` are exclussive", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling of exclusive
On 2/19/18 9:28 PM, LifeIsStrange wrote:> Does it support autoFDO ? It's
the secret magic that allow Clear Linux
to be 40% Faster than others Linux distros :
https://www.phoronix.com/scan.php?page=article&item=ubuntu-clear-eoy2017&num=2
https://clearlinux.org/features/automatic-feedback-directed-optimizer
That would be awesome is that could be enabled by default.. Because most
people don't know about those options.
It supports the same options as clangs -fprofile-generate and
-fprofile-use, which is instruction-based profiling. AutoFDO looks
sample-based, which looks more inaccurate to me, and looking at
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091162.html it
doesn't look like it outperforms the instruction-level instrumentation.
It may be worth to allow reading a sample-based profile, but most of the
profile generation needs to be done using `perf` or some other tool, so
it looks like it doesn't need so much compiler instrumentation.
HS: I know you are a Firefox Dev @emilio <https://github.com/emilio>
don't you think you could use this for freely significatively improving
Firefox/servo performance ?
Firefox uses instruction-level instrumentation for its release builds,
using MSVC's PGO on windows, and clang's / gcc's on Linux.
I need to get back to this btw, I got sidetracked with other stuff, but
I hope to be done with it next week.
|
Ping from triage, @alexcrichton ! |
@shepmaster oh I think we're still waiting on numbers |
Yes, numbers, put them under a experimental flag... All those need to happen, haven taken time yet, sorry! |
was trying to get some times for this PR and is confused by the result. real 0m38.155s user 0m38.125s sys 0m0.016s real 1m41.192s user 1m41.156s sys 0m0.000s so the pgo version is 2.6x slower than the non-pgo version :( What I did to get this numbers$ git clone https://github.com/Geal/pgo-rust.git
$ cd pgo-rust
$ rustc +stage2 -Copt-level=3 -Ccodegen-units=1 -Cpgo-gen=test.profraw src/main.rs -o main-pgo-gen
$ ./main-pgo-gen 1000000000
$ ~/rust/build/x86_64-unknown-linux-gnu/llvm/bin/llvm-profdata merge -output=pgo.profdata test.profraw
$ rustc +stage2 -C opt-level=3 -Ccodegen-units=1 -Cpgo-use=pgo.profdata src/main.rs -o main-pgo-use
$ rustc +stage2 -C opt-level=3 -Ccodegen-units=1 src/main.rs -o main
$ time ./main 1000000000
$ time ./main-pgo-use 1000000000 |
So, I could sort of repro the numbers from @andjo403, (not so extreme 60 vs 90 seconds, with a stage1 compiler). Good news is that it works and the profile data can be used to tweak the generated code ;). Bad news is that it seems to barf at least on that specific test-case :(. This is the script I've used to profile this: #!/usr/bin/env bash
set -o errexit
RUSTC=./build/x86_64-unknown-linux-gnu/stage1/bin/rustc
TEST=test.rs
FLAGS="-Copt-level=3 -Ccodegen-units=1 -C save-temps"
COUNT=100000000
rm -rf *.profraw *.profdata
$RUSTC $FLAGS -Cpgo-gen=default.profraw $TEST -o test-pgo-gen
./test-pgo-gen $COUNT
./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-profdata merge -output=pgo.profdata default.profraw
$RUSTC $FLAGS -Cpgo-use=pgo.profdata $TEST -o test-pgo-use
$RUSTC $FLAGS $TEST -o test-no-pgo
time ./test-no-pgo $COUNT
time ./test-pgo-use $COUNT The llvm IR produced before optimizations is the same for all three executables (I had to disable probestack unconditionally for this to be true). This is the optimized IR for the pgo-use and no-pgo: https://gist.github.com/emilio/93875578cfd1399a18ce764917f57ab4. So, so far the IR seems believable (it has a bunch of So I'll try to gather other test-cases and figure out what's going on, but if someone that knows more about LLVM has any idea, that'd be really nice :) |
So, a bit more information... It looks that what is destroying performance in this particular test is the instrumentation pre-inlining that LLVM does: So, with a manual -disable-preinline, perf comes back to baseline values. In this particular test-case, after that fix, and a manual fixup to the profile that I'm investigating (the version number generated claims to be a "Front-end" profile instead of a IR one), I manage to get a LLVM IR with PGO annotations. At That being said, I think I want to land this for the following tweaks:
@alexcrichton @michaelwoerister does that sound reasonable? I really need to figure out why the profile version is wrong (afaict the profile version variable in the instrumented binary is wrong), but that doesn't block experimentation (I just need to tweak a byte manually). |
@emilio sounds reasonable to me, thanks for the continuing investigation here! |
26c459e
to
f8a9a5f
Compare
👍 |
I think I'm happy with the current state of the PR. |
⌛ Testing commit 5af2f80 with merge e1d05bd6cc37cba62ad232d6edb131cf0aa4d178... |
💔 Test failed - status-appveyor |
Sigh, the relevant tests failed because of:
Don't have a windows machine, but last commit should fix, hopefully. |
Hm an error like that probably indicates that this isn't ready to compile on MSVC yet? Or maybe we're not compiling it correctly? Perhaps it should be disabled there? |
Yeah, I think my patch should fix, it's what we do for other similar functions that in MSVC have an underscore prefix. I'd like to try (I don't have try access on this repo I think), and if it happens not to work I'll disable it in MSVC. |
s/MSVC/Windows above I guess :) |
@bors: r+ Ok we can try! We will definitely want to scrutinize this before stabilization! |
📌 Commit 1e1d907 has been approved by |
maybe this is the same fault that LDC have, they have dissabled IR PGO for windows see ldc-developers/ldc#2539 |
That looks like a different error. This was an error linking libprofiler_builtins. Maybe we find the same error now, but I hope not :) |
Add basic PGO support. This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), `-C pgo-use` and `-C pgo-gen`. See each commit for details.
☀️ Test successful - status-appveyor, status-travis |
For other people who want try this out: the flag is enabled via |
Mark `___asan_globals_registered` as an exported symbol for LTO Export a weak symbol defined by AddressSanitizer instrumentation. Previously, when using LTO, the symbol would get internalized and eliminated. Fixes rust-lang#113404. --------------- FWIW, let me list similar PRs from the past + who reviewed them: * rust-lang#68410 (fixing `__msan_keep_going` and `__msan_track_origins`; reviewed by `@varkor)` * rust-lang#60038 and rust-lang#48346 (fixing `__llvm_profile_raw_version` and `__llvm_profile_filename`; reviewed by `@alexcrichton)`
Mark `___asan_globals_registered` as an exported symbol for LTO Export a weak symbol defined by AddressSanitizer instrumentation. Previously, when using LTO, the symbol would get internalized and eliminated. Fixes rust-lang#113404. --------------- FWIW, let me list similar PRs from the past + who reviewed them: * rust-lang#68410 (fixing `__msan_keep_going` and `__msan_track_origins`; reviewed by ``@varkor)`` * rust-lang#60038 and rust-lang#48346 (fixing `__llvm_profile_raw_version` and `__llvm_profile_filename`; reviewed by ``@alexcrichton)``
This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses),
-C pgo-use
and-C pgo-gen
.See each commit for details.