-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for disabling PLT for better function call performance #54592
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
cc @rust-lang/compiler — I'm not expert on this, but based on the description, seems like a "no brainer". Is there a catch? |
@rfcbot fcp merge I move that we merge this PR. As I wrote before, I'm not an expert on this stuff; the fact though that some distros enable the flag by default suggests we might as well do it. I'm curious whether anyone knows of any downsides or reasons not to do it. |
Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:
No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
cc @cuviper — seems like something you might know about :) |
Note that PPC64 is only defaulting to partial relro due to an old ld.so bug in bind-now. There's also an option for |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I'm not sure if this is the right place in the codegen to enable this attribute, or if we'd better enable it somewhere else.
Also, I'm not sure how to fix the failing test. Is it possible for |
Please add a flag that controls this behaviour. For now it can be a debug
CHECK lines only check for matching line prefix. That test in particular seems to be testing for |
A large number of (Addressed not to author, but somebody who knows how to do perf runs) I also think a perf run would be great, but not sure how to start it. |
@bors try |
[WIP] Support for disabling PLT for better function call performance This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection. AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway. This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds). Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang. ## Performance I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs): ``` name control ns/iter no-plt ns/iter diff ns/iter diff % speedup build_app_long 11,097 10,733 -364 -3.28% x 1.03 build_app_short 11,089 10,742 -347 -3.13% x 1.03 build_help_long 186,835 182,713 -4,122 -2.21% x 1.02 build_help_short 80,949 78,455 -2,494 -3.08% x 1.03 parse_clean 12,385 12,044 -341 -2.75% x 1.03 parse_complex 19,438 19,017 -421 -2.17% x 1.02 parse_lots 431,493 421,421 -10,072 -2.33% x 1.02 ``` A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%. ## To do - [ ] Do a perf run to see the effect this has on the compiler (cc @michaelwoerister), and possibly run benchmarks on some more crates - [ ] Add a code gen test - [ ] Should this be always enabled or should it be behind a command line option? If so, what should it be called? `-Z no-plt`? `-Z plt=no`?
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Test successful - status-travis |
@rust-timer build 5747631 |
Please provide the full 40 character commit hash. |
Success: Queued 5747631 with parent 6846f22, comparison URL. |
Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes about 20% of the total PLT calls, there's probably still some more performance to be gained.
Alright, I've implemented a Also, what should we do if the user specified
Thanks for the tip, but I'm still unable to get rid of the PLT. I've added If I build C binaries on my system, the final binary doesn't even have a EDIT: it seems we need to set some module-level metadata to ensure this also works for intrinsics. |
it is fine to have plt disabled by default, i think. it would be a good
idea to design the flag so that it would make sense regardless of the
default. so something like `-Zplt=on`, `-Zplt=off` instead of -Zno-plt. i
recommend looking at -Zmutable-noalias for an implementation example.
…On Thu, Sep 27, 2018, 11:32 Gabriel Majeri ***@***.***> wrote:
Perf results are in, nice improvements on wall time. From what I've seen,
the patch currently only removes *about 20%* of the total PLT calls,
there's probably still some more performance to be gained.
@nagisa <https://github.com/nagisa>
Please add a flag that controls this behaviour.
Alright, I've implemented a -Z no-plt flag, but should this option be
enabled or disabled by default?
Also, what should we do if the user specified -Z no-plt, but the flag is
then ignored, because full relro isn't supported (for example, as cuviper
mentioned, linker issues on PowerPC64)?
symbols likely come from outside the rust ecosystem
Thanks for the tip, but I'm still unable to get rid of the PLT. I've added
CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but
rustc still generates *lots* of calls which use the PLT.
If I build C binaries on my system, the final binary doesn't even have a
.plt section, it is completly removed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#54592 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0m5bSn_cJdwPG4TpD2HeVS__i6glks5ufI0AgaJpZM4W7FzR>
.
|
I think the way you've documented it is fine, "(default: PLT is disabled if full relro is enabled)".
Don't ignore it. These are advanced options -- if the user asks for |
The way I see it, |
This should still be a part of target specification because there are
targwta defined outside of the compiler and we do t test them all.
…On Thu, Oct 11, 2018, 06:57 Gabriel Majeri ***@***.***> wrote:
The way I see it, -Z plt=off is an optimization, we don't guarantee it
does anything (it's a best effort kind of thing). For now, I changed the
code to unconditionally disable the optimization on gnux32, at least
until LLVM gets fixed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#54592 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0qa2Eaq3zyK0cBG93zjHz_aFFJelks5ujsGYgaJpZM4W7FzR>
.
|
@nagisa As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue. The code checks for the (custom) target's For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes). |
I'm arguing that the author of external target specifications should have a
full control over such matters.
While gnux32 is the only target we know that crashes, our triplet coverage
is incomplete and we can't say for sure that any future triplets llvm may
support will be problem free either.
In the past platform specific settings and workarounds were always stored
within the target files and I see no reason whatsoever for us to change
that now. And it is strictly more flexible than checking whether llvm
triple string ends in a specific substring.
I also don't see much of an issue with allowing people to force arbitrary
configurations (i.e. allowing them to specify -Zplt=off) even if we know
they might not be working, ignored, irrelevant etc. At worst they'll
encounter an ICE. In fact, if they do explicitly request for some
behaviour, I believe that should override any workaround or default (which
is not what happens with the current iteration of the PR)
…On Thu, Oct 11, 2018, 08:39 Gabriel Majeri ***@***.***> wrote:
@nagisa <https://github.com/nagisa> As far as I understand, even if
somebody defines some external target with this ABI, there shouldn't be an
issue.
The code checks for the (custom) target's llvm-target attribute to see if
it contains gnux32. This is as far as we know the only ABI where LLVM
currently has an issue (due to a bug).
For example, Clang accepts this option for all targets and ABIs (even
Windows). On targets where it doesn't do anything, it emits the attributes,
and LLVM just ignores them (except for this buggy ABI which crashes).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#54592 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0mkkFmjCAO_7YT5ecCUdrWQai3liks5ujtmQgaJpZM4W7FzR>
.
|
(@nagisa said at T-compiler meeting that we can un-nominate this) |
Disable the PLT where possible to improve performance for indirect calls into shared libraries. This optimization is enabled by default where possible. - Add the `NonLazyBind` attribute to `rustllvm`: This attribute informs LLVM to skip PLT calls in codegen. - Disable PLT unconditionally: Apply the `NonLazyBind` attribute on every function. - Only enable no-plt when full relro is enabled: Ensures we only enable it when we have linker support. - Add `-Z plt` as a compiler option
@nagisa ok, I've added a |
Perfect. Thanks! @bors r+ |
📌 Commit 6009da0 has been approved by |
Support for disabling PLT for better function call performance This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection. AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway. This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds). Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang. ## Performance I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs): ``` name control ns/iter no-plt ns/iter diff ns/iter diff % speedup build_app_long 11,097 10,733 -364 -3.28% x 1.03 build_app_short 11,089 10,742 -347 -3.13% x 1.03 build_help_long 186,835 182,713 -4,122 -2.21% x 1.02 build_help_short 80,949 78,455 -2,494 -3.08% x 1.03 parse_clean 12,385 12,044 -341 -2.75% x 1.03 parse_complex 19,438 19,017 -421 -2.17% x 1.02 parse_lots 431,493 421,421 -10,072 -2.33% x 1.02 ``` A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%. ## Security benefits **Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723). ## Remaining PLT calls The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
☀️ Test successful - status-appveyor, status-travis |
FWIW this looks like it may cause bugs in LLVM on i686-unknown-linux-gnu. I noticed that stdsimd's CI was failing for i686-unknown-linux-gnu because rustc was segfaulting. Some local investigation showed a segfault in LLVM. We compile that target with |
FWIW, this broke pretty badly on certain (older) distro toolchains, but because |
I think If you see positive benchmark results, it is likely because dynamically linked libc calls dominate. If one statically links libc, Plus, x86-32 requires very new lld (as a maintainer, I just added support for |
This PR gives
rustc
the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.
This is a little known feature which is supported by GCC and Clang as
-fno-plt
(some Linux distros enable it by default for all builds).Implementation inspired by this patch which adds
-fno-plt
support to Clang.Performance
I didn't run a lot of benchmarks, but these are the results on my machine for a
clap
benchmark:A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. This comment suggests that, in some cases,
-fno-plt
could improve PIC/PIE code performance by 10%.Security benefits
Bonus: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for
retpoline
.Remaining PLT calls
The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with
CFLAGS=-fno-plt CXXFLAGS=-fno-plt
removes them.