Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for disabling PLT for better function call performance #54592

Merged
merged 1 commit into from
Oct 11, 2018
Merged

Support for disabling PLT for better function call performance #54592

merged 1 commit into from
Oct 11, 2018

Conversation

GabrielMajeri
Copy link
Contributor

@GabrielMajeri GabrielMajeri commented Sep 26, 2018

This PR gives rustc the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

This is a little known feature which is supported by GCC and Clang as -fno-plt (some Linux distros enable it by default for all builds).

Implementation inspired by this patch which adds -fno-plt support to Clang.

Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a clap benchmark:

 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup 
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03 
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03 
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02 
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03 
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03 
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02 
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02 

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. This comment suggests that, in some cases, -fno-plt could improve PIC/PIE code performance by 10%.

Security benefits

Bonus: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for retpoline.

Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with CFLAGS=-fno-plt CXXFLAGS=-fno-plt removes them.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 26, 2018
@GabrielMajeri GabrielMajeri changed the title Support for disabling PLT for better function call performance [WIP] Support for disabling PLT for better function call performance Sep 26, 2018
@nikomatsakis
Copy link
Contributor

cc @rust-lang/compiler — I'm not expert on this, but based on the description, seems like a "no brainer". Is there a catch?

@nikomatsakis nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 26, 2018
@nikomatsakis
Copy link
Contributor

@rfcbot fcp merge

I move that we merge this PR. As I wrote before, I'm not an expert on this stuff; the fact though that some distros enable the flag by default suggests we might as well do it. I'm curious whether anyone knows of any downsides or reasons not to do it.

@rfcbot
Copy link

rfcbot commented Sep 26, 2018

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 26, 2018
@nikomatsakis
Copy link
Contributor

cc @cuviper — seems like something you might know about :)

@eddyb
Copy link
Member

eddyb commented Sep 26, 2018

cc @alexcrichton

@cuviper
Copy link
Member

cuviper commented Sep 26, 2018

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

Note that PPC64 is only defaulting to partial relro due to an old ld.so bug in bind-now.
https://github.com/rust-lang/rust/pull/43170/files#diff-b2d51315427bd679ca33d47167e82171R20

There's also an option for -Z relro-level={full,partial,off}. I'll try to see if similar PPC64 issues arise with -fno-plt, but my initial feeling is that we should only enable this in conjunction with relro-level=full.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:55:36] ....................................................................................................
[00:55:39] ..............................................................i.....................................
[00:55:42] ....................................................................................................
[00:55:45] ....................................................................................................
[00:55:48] ...........iiiiiiiii................................................................................
[00:55:53] ....................................................................................................
[00:55:57] ...............................................................................................i....
[00:56:00] ....................................................................................................
[00:56:03] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:04:25] 
[01:04:25] running 107 tests
[01:04:28] i..ii...iii....i...i............iii...........i....Fi....ii...i.i.ii..............i...ii..ii.i....ii
[01:04:28] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:04:28] failures:
[01:04:28] 
[01:04:28] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:04:28] 
[01:04:28] 
[01:04:28] error: verification with 'FileCheck' failed
[01:04:28] status: exit code: 1
[01:04:28] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] stderr:
[01:04:28] stderr:
[01:04:28] ------------------------------------------
[01:04:28] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:04:28] // CHECK: Function Attrs: naked uwtable
[01:04:28]           ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:04:28] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:04:28] ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:04:28] ; Function Attrs: naked nonlazybind uwtable
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:04:28] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:04:28] 
[01:04:28] 
[01:04:28] 
[01:04:28] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:04:28] 
[01:04:28] 
[01:04:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Makefile:58: recipe for target 'check' failed
[01:04:28] make: *** [check] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0d3ba4a0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:0204f06f:start=1537985051202083688,finish=1537985051358059111,duration=155975423
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1243b65a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:23d227fe
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@GabrielMajeri
Copy link
Contributor Author

I'm not sure if this is the right place in the codegen to enable this attribute, or if we'd better enable it somewhere else.

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

@nagisa
Copy link
Member

nagisa commented Sep 26, 2018

Please add a flag that controls this behaviour. For now it can be a debug -Z flag, similar to other such flags (e.g. -Zmutable-noalias).

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

CHECK lines only check for matching line prefix. That test in particular seems to be testing for naked only, in which case you can probably remove the other attribute from the CHECK line to have it pass. Alternatively, you might have some success with pattern matching syntax.

@nagisa
Copy link
Member

nagisa commented Sep 26, 2018

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

A large number of @PLT symbols likely come from outside the rust ecosystem (e.g. glibc, llvm, etc.). Those might need to be taken care of independently (by changing build system configuration, perhaps?). You might want to submit a similar patch to the cc crate.


(Addressed not to author, but somebody who knows how to do perf runs) I also think a perf run would be great, but not sure how to start it.

@varkor
Copy link
Member

varkor commented Sep 26, 2018

@bors try

bors added a commit that referenced this pull request Sep 26, 2018
[WIP] Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## To do

- [ ] Do a perf run to see the effect this has on the compiler (cc @michaelwoerister),
  and possibly run benchmarks on some more crates

- [ ] Add a code gen test

- [ ] Should this be always enabled or should it be behind a command line option?
  If so, what should it be called? `-Z no-plt`? `-Z plt=no`?
@bors
Copy link
Contributor

bors commented Sep 26, 2018

⌛ Trying commit ddf98c1 with merge 5747631...

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:58:24] ....................................................................................................
[00:58:27] ..............................................................i.....................................
[00:58:30] ....................................................................................................
[00:58:33] ....................................................................................................
[00:58:36] ............iiiiiiiii...............................................................................
[00:58:42] ....................................................................................................
[00:58:46] ...............................................................................................i....
[00:58:49] ....................................................................................................
[00:58:52] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:07:32] 
[01:07:32] running 107 tests
[01:07:35] i..ii...iii....i...i............iii...........i.....iF...ii...i.i.ii..............i...ii..ii.i....ii
[01:07:35] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:07:35] failures:
[01:07:35] 
[01:07:35] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:07:35] 
[01:07:35] 
[01:07:35] error: verification with 'FileCheck' failed
[01:07:35] status: exit code: 1
[01:07:35] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] stderr:
[01:07:35] stderr:
[01:07:35] ------------------------------------------
[01:07:35] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:07:35] // CHECK: Function Attrs: naked uwtable
[01:07:35]           ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:07:35] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:07:35] ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:07:35] ; Function Attrs: naked nonlazybind uwtable
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:07:35] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:07:35] 
[01:07:35] 
[01:07:35] 
[01:07:35] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:07:35] 
[01:07:35] 
[01:07:35] failed to run: /checkout/obj/build/756 ./src/tools/lldb/www
37080 ./obj/build/x86_64-unknown-linux-gnu/stage0-std/release
---
travis_time:end:0c6f72ba:start=1537990447136677365,finish=1537990447141035866,duration=4358501
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:017849be
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then print

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors
Copy link
Contributor

bors commented Sep 26, 2018

☀️ Test successful - status-travis
State: approved= try=True

@varkor
Copy link
Member

varkor commented Sep 26, 2018

@rust-timer build 5747631

@rust-timer
Copy link
Collaborator

Please provide the full 40 character commit hash.

@rust-timer
Copy link
Collaborator

Success: Queued 5747631 with parent 6846f22, comparison URL.

@GabrielMajeri
Copy link
Contributor Author

GabrielMajeri commented Sep 27, 2018

Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes about 20% of the total PLT calls, there's probably still some more performance to be gained.

@nagisa

Please add a flag that controls this behaviour.

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

symbols likely come from outside the rust ecosystem

Thanks for the tip, but I'm still unable to get rid of the PLT. I've added CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but rustc still generates lots of calls which use the PLT.

If I build C binaries on my system, the final binary doesn't even have a .plt section, it is completly removed.

EDIT: it seems we need to set some module-level metadata to ensure this also works for intrinsics.

@nagisa
Copy link
Member

nagisa commented Sep 27, 2018 via email

@cuviper
Copy link
Member

cuviper commented Sep 27, 2018

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

I think the way you've documented it is fine, "(default: PLT is disabled if full relro is enabled)".

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

Don't ignore it. These are advanced options -- if the user asks for plt=off without full relro, let them deal with the implications. So the check is something like plt.unwrap_or(relro != Full).

@GabrielMajeri
Copy link
Contributor Author

GabrielMajeri commented Oct 11, 2018

The way I see it, -Z plt=off is an optimization, we don't guarantee it does anything (it's a best effort kind of thing). For now, I changed the code to unconditionally disable the optimization on gnux32 and always enable the PLT on that target, at least until LLVM gets fixed.

@nagisa
Copy link
Member

nagisa commented Oct 11, 2018 via email

@GabrielMajeri
Copy link
Contributor Author

@nagisa As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue.

The code checks for the (custom) target's llvm-target attribute to see if it contains gnux32. This is as far as we know the only ABI where LLVM currently has an issue (due to a bug).

For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes).

@nagisa
Copy link
Member

nagisa commented Oct 11, 2018 via email

@pnkfelix
Copy link
Member

(@nagisa said at T-compiler meeting that we can un-nominate this)

Disable the PLT where possible to improve performance
for indirect calls into shared libraries.

This optimization is enabled by default where possible.

- Add the `NonLazyBind` attribute to `rustllvm`:
  This attribute informs LLVM to skip PLT calls in codegen.

- Disable PLT unconditionally:
  Apply the `NonLazyBind` attribute on every function.

- Only enable no-plt when full relro is enabled:
  Ensures we only enable it when we have linker support.

- Add `-Z plt` as a compiler option
@GabrielMajeri
Copy link
Contributor Author

@nagisa ok, I've added a needs_plt target option which can be customized for each target. It is used to help determine a default for the PLT option (and -Z plt always overrides the setting).

@nagisa
Copy link
Member

nagisa commented Oct 11, 2018

Perfect. Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Oct 11, 2018

📌 Commit 6009da0 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 11, 2018
@bors
Copy link
Contributor

bors commented Oct 11, 2018

⌛ Testing commit 6009da0 with merge 77af314...

bors added a commit that referenced this pull request Oct 11, 2018
Support for disabling PLT for better function call performance

This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway.

This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).

Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.

## Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):

```
 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02
```

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.

## Security benefits

**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).

## Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
@bors
Copy link
Contributor

bors commented Oct 11, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 77af314 to master...

@bors bors merged commit 6009da0 into rust-lang:master Oct 11, 2018
@bors bors mentioned this pull request Oct 11, 2018
@GabrielMajeri GabrielMajeri deleted the no-plt branch October 12, 2018 03:34
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 14, 2018
@alexcrichton
Copy link
Member

FWIW this looks like it may cause bugs in LLVM on i686-unknown-linux-gnu. I noticed that stdsimd's CI was failing for i686-unknown-linux-gnu because rustc was segfaulting. Some local investigation showed a segfault in LLVM. We compile that target with -C relocation-model=static on CI, and I believe the combination of 32-bit Linux with -C relocation-model=static was causing the issue. I haven't had a chance to dig deeper. I've worked around it with -Z plt=yes

@eddyb
Copy link
Member

eddyb commented Jun 5, 2019

FWIW, this broke pretty badly on certain (older) distro toolchains, but because dylib is largely unused, it took someone (ab)using proc_macro::bridge to run proc macros outside of rustc to trigger it: #61539.

@MaskRay
Copy link
Contributor

MaskRay commented Jan 2, 2023

I think -Z plt=no is not a good default. Sent #106380 to disable it. (Thanks to @GabrielMajeri for mentioning this review as I cannot find it with the commits).

If you see positive benchmark results, it is likely because dynamically linked libc calls dominate. If one statically links libc, -Z plt=no is going to be a pessimization. For many benchmarks where cross-translation-unit functions calls resolve to the same component, -Z plt=no is going to be a pessimization.

Plus, x86-32 requires very new lld (as a maintainer, I just added support for ___tls_get_addr; older lld will create a silently corrupted executable).
For GNU ld, a relatively new one is needed: 2016-06 https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=e2cbcd9156d1606a9f2153aecd93a89fe6e29180 (and a counterpart for x86-32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.