Skip to content

Commit

Permalink
gh-90536: Add support for the BOLT post-link binary optimizer (gh-95908)
Browse files Browse the repository at this point in the history
* Add support for the BOLT post-link binary optimizer

Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.

It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).

Compared to [a previous attempt](faster-cpython/ideas#224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.

The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).

The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.

This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.

* Simplify the build flags

* Add a NEWS entry

* Update Makefile.pre.in

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

* Update configure.ac

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>

* Add myself to ACKS

* Add docs

* Other review comments

* fix tab/space issue

* Make it more clear that --enable-bolt is experimental

* Add link to bolt's github page

Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
  • Loading branch information
kmod and corona10 authored Aug 18, 2022
1 parent 22a95cb commit 214eb2c
Show file tree
Hide file tree
Showing 7 changed files with 351 additions and 1 deletion.
21 changes: 20 additions & 1 deletion Doc/using/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,8 @@ Performance options
-------------------

Configuring Python using ``--enable-optimizations --with-lto`` (PGO + LTO) is
recommended for best performance.
recommended for best performance. The experimental ``--enable-bolt`` flag can
also be used to improve performance.

.. cmdoption:: --enable-optimizations

Expand Down Expand Up @@ -231,6 +232,24 @@ recommended for best performance.
.. versionadded:: 3.11
To use ThinLTO feature, use ``--with-lto=thin`` on Clang.

.. cmdoption:: --enable-bolt

Enable usage of the `BOLT post-link binary optimizer
<https://github.com/llvm/llvm-project/tree/main/bolt>` (disabled by
default).

BOLT is part of the LLVM project but is not always included in their binary
distributions. This flag requires that ``llvm-bolt`` and ``merge-fdata``
are available.

BOLT is still a fairly new project so this flag should be considered
experimental for now. Because this tool operates on machine code its success
is dependent on a combination of the build environment + the other
optimization configure args + the CPU architecture, and not all combinations
are supported.

.. versionadded:: 3.12

.. cmdoption:: --with-computed-gotos

Enable computed gotos in evaluation loop (enabled by default on supported
Expand Down
4 changes: 4 additions & 0 deletions Doc/whatsnew/3.12.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,10 @@ Optimizations
It reduces object size by 8 or 16 bytes on 64bit platform. (:pep:`623`)
(Contributed by Inada Naoki in :gh:`92536`.)

* Added experimental support for using the BOLT binary optimizer in the build
process, which improves performance by 1-5%.
(Contributed by Kevin Modzelewski in :gh:`90536`.)


CPython bytecode changes
========================
Expand Down
10 changes: 10 additions & 0 deletions Makefile.pre.in
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,16 @@ profile-opt: profile-run-stamp
-rm -f profile-clean-stamp
$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"

bolt-opt: @PREBOLT_RULE@
rm -f *.fdata
@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst
./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true
@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata
@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
rm -f *.fdata
rm -f $(BUILDPYTHON).bolt_inst
mv $(BUILDPYTHON).bolt $(BUILDPYTHON)

# Compile and run with gcov
.PHONY=coverage coverage-lcov coverage-report
coverage:
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,7 @@ Gideon Mitchell
Tim Mitchell
Zubin Mithra
Florian Mladitsch
Kevin Modzelewski
Doug Moen
Jakub Molinski
Juliette Monsel
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Use the BOLT post-link optimizer to improve performance, particularly on
medium-to-large applications.
261 changes: 261 additions & 0 deletions configure

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 214eb2c

Please sign in to comment.