Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Various build system improvements #101093

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

indygreg
Copy link
Contributor

I'm creating this PR to track various build system changes that I'd like to make to CPython.

I'm creating a draft PR instead of opening real PRs because general OSS experience has taught me that if I trickle things out slowly without any prior context project maintainers won't see the big picture and/or benefits of the changes. My hope is that by tracking all proposed changes in a Git branch that people can easily diff that I'll have an easier time convincing folks to accept the patches.

Next Steps

CPython maintainer(s) should look at each commit in this branch. I write detailed commit messages. See each commit for more details.

Commits beginning with gh-xxxx: aren't yet assigned issues/PRs.

Then, just tell me which commits to turn into issues/PRs and I'll do so. Keep in mind some commits are dependent on others. For example, the BOLT enhancements (which appear to deliver meaningful pyperformance wins) depends on the overhaul build rules for optimized binaries commit and its precursors.

I have several more potential patches to include in this branch. Including a bunch that enable cross-compiling for Apple platforms. I figured I'd start with just the build system + BOLT improvements to get people's attention.

Most of this work is derived from my efforts on https://github.com/indygreg/python-build-standalone.

cc @corona10 since you seem to have an interest in all the BOLT PRs.

@corona10 corona10 self-requested a review January 17, 2023 00:16
@corona10 corona10 self-assigned this Jan 17, 2023
@corona10
Copy link
Member

@indygreg cc @gvanrossum

Due to the holiday weekend in Korea (Lunar new year), I will review relevant PRs by next weekend.
Thanks for submitting these PRs :)

@aaupov
Copy link

aaupov commented Jan 18, 2023

Sidenote: BOLT now supports DWARF5, so there's no need to force -gdwarf-4

https://github.com/python/cpython/blob/main/configure.ac#L1950-1952

@corona10
Copy link
Member

Sidenote: BOLT now supports DWARF5, so there's no need to force -gdwarf-4

Nice news! Thanks for let us know :)

@indygreg, Let's drop the flag -gdwarf-4 it was only workaround approach
#95908 (comment)

cc @aaupov

@indygreg
Copy link
Contributor Author

No rush on the reviews, @corona10: I'm also busy celebrating the Lunar New Year this weekend.

Some specific guidance I could use help with is how to split up / fold commits and how to associate them with a GitHub Issue. To be honest I find the whole CPython contribution process of managing GitHub Issues in addition to a PR to be confusing. Is there a 1:1 relationship? 1:n? https://devguide.python.org/ is a bit ambiguous on this topic.

I'm lazy/busy, so prefer as little overhead / separate GitHub Issues as possible. But I can go along with whatever. If someone tells me what to do I'll do it.

Also, work that I'd like to do but haven't started working on yet is enabling some of makesetup's features to be rolled into configure.ac. e.g. support for building/linking an extension statically. A lot of work in past ~1.5 years towards moving extension module config logic into configure.ac enables this. But I'm not sure if makesetup deprecation was an end goal. In my mind it is certainly possible to inline makesetup fully into configure.ac.

The end state I'd like to achieve is the ability for CPython's build system to build ~all extensions statically with minimal effort. e.g. ./configure --extensions-policy=all-static. I'd have to dig it up, but I believe there was a thread somewhere with some CPython maintainers in support of supporting easier static linking of extensions. I just don't know if anyone has opinions on how it should play out. makesetup still has lines that annotate back to Guido's commits made in 1994 and experience has taught me that it is best to ask before taking a sledge hammer to 25+ year old functionality!

@corona10
Copy link
Member

corona10 commented Jan 24, 2023

@indygreg Let's separate the PR with BOLT itself and configuration enhancement.
Please reference #101282 about the BOLT PRs.

@corona10
Copy link
Member

@indygreg

The end state I'd like to achieve is the ability for CPython's build system to build ~all extensions statically with minimal effort.

One more thing, I have also interested in supporting build all extensions statically with minimal effort for internal purposes.
(Yeah, I have use cases if this build is supported)
To support this, we need to write the PEP first. Can we discuss this through email?
My email is donghee.na@python.org

@erlend-aasland
Copy link
Contributor

The end state I'd like to achieve is the ability for CPython's build system to build ~all extensions statically with minimal effort.

What wrong with just editing Modules/Setup.stdlib in order to achieve this?

@erlend-aasland
Copy link
Contributor

@indygreg, are you planning on following up this PR?

@erlend-aasland erlend-aasland added the pending The issue will be closed if no feedback is provided label May 8, 2023
@indygreg
Copy link
Contributor Author

@indygreg, are you planning on following up this PR?

I could probably find time to polish off the PGO/BOLT changes if still wanted. Let's treat the broader extension module pieces I referred to as out of scope.

BTW, one of the things the stack of commits in this PR does is it adds BOLT optimization to libpython. Current main applies BOLT only to the python binary. It is unclear to me if the various CPython performance with BOLT threads I've read are based on the incomplete application of BOLT only to python. When I was developing these commits in January I did see some significant pyperformance changes after BOLT was applied to libpython. That makes sense: there's more code in libpython than in python.

@erlend-aasland
Copy link
Contributor

It would be really helpful if you could split this up in separate issues for each isolated change, and for each issue write a few sentences about why the change is needed, what value it adds, etc. Applying BOLT to libpython makes a lot of sense to me (based on your analysis), but I have no experience when it comes to BOLT :)

@indygreg
Copy link
Contributor Author

indygreg commented May 14, 2023 via email

@erlend-aasland
Copy link
Contributor

All the details/justification are there in the commit messages.

That's great, but it is easier to interact using GitHub Issues as opposed to comments on single commits.

@indygreg indygreg force-pushed the build-system-fixes branch 2 times, most recently from ff74a68 to c63a03d Compare May 15, 2023 02:03
@indygreg
Copy link
Contributor Author

First 2 PRs from this branch are up:

Need to wait on publishing the next one because it depends on #104491 and I reckon GitHub's already brittle interface for stacking PRs will fall apart with all the automation CPython is using. So I'm not even going to try to do it.

Various rules were only ever invoked once and had minimal bodies.
I don't see a benefit to the indirection.

So this commit inlines rules to simplify the PGO logic.

skip news
This commit overhauls the make-based build system's rules for
building optimized binaries. Along the way it fixes a myriad of
bugs and shortcomings with the prior approach.

The old way of producing optimized binaries had various limitations:

* `make [all]` would do work when PGO was enabled because the phony
  `profile-opt` rule was non-empty. This prevented no-op PGO builds
   from working at all. This meant workflows like `make; make install`
   either incurred extra work or failed due to race conditions.
* Same thing for BOLT, as its `bolt-opt` rule was also non-empty
  and always ran during `make [all]`.
* BOLT could not be run multiple times without a full rebuild because
  `llvm-bolt` can't instrument binaries that have already received
  BOLT optimizations.
* It was difficult to run BOLT on its own because of how various make
  targets and their dependencies were structured.
* I found the old way that configure and make communicated the default
  targets to be confusing and hard to understand.

There are essentially 2 major changes going on in this commit:

1. A rework of the high-level make targets for performing a build and
   how they are defined.
2. A rework of all the make logic related to profile-based optimization
   (read: PGO and BOLT).

Build Target Rework
======================

Before, we essentially had `build_all`, `profile-opt`, `bolt-opt` and
`build_wasm` as our 3 targets for performing a build. `all` would alias
to one of these, as appropriate.

And there was another definition for which _simple_ make target to
evaluate for non-optimized builds. This was likely `build_all` or
`all`.

In the rework, we introduce 2 new high-level targets:

* `build-plain` - Perform a build without optimizations.
* `build-optimized` - Perform a build with optimizations.

`build-plain` is aliased to `build_all` in all configurations except
WASM, where it is `build_wasm`.

`build-optimized` by default is aliased to a target that prints an error
message when optimizations aren't enabled. If PGO or BOLT are enabled,
it is aliased to their respective target.

`build-optimized` is the logical successor to `profile-opt`.

I felt it best to delete `profile-opt` completely, as the new `build-*`
high-level targets feel more friendly to use. But if people lament its
loss, we can add a `profile-opt: build-optimized` to achieve almost the
same result.

Profiled-Based Optimization Rework
==================================

Most of the make logic related to profile-based optimization (read: PGO
and BOLT) has been touched in this change.

A major issue with the old way of doing things was we used phony,
always-executed make rules. This is a bad practice in make because it
undermines no-op builds.

Another issue is that the separation between the rules and what order
they ran in wasn't always clear. Both PGO and BOLT consist of the same
4 phase solution: instrument, run, analyze, and apply. However, these
steps weren't clearly expressed in the make logic. This is especially
true for BOLT, which only had 1 make rule.

Another issue with BOLT is that it was really easy to get things into
a bad state. e.g. if you applied BOLT to `pythonX.Y` you could not
run BOLT again unless you rebuilt `pythonX.Y` from source.

In the new world, we have separate `profile-<tool>-<stage>-stamp`
rules defining the 4 distinct `instrument`, `run`, `analyze`, and
`apply` stages for both PGO and BOLT. Each of these stages is tracked
by a _stamp_ semaphore file so progress can be captured. This should
all be pretty straightforward.

There is some minimal complexity here to handle BOLT's optional
dependency on PGO, as BOLT either depends on `build_all` or
`profile-pgo-apply-stamp`.

As part of the refactor to BOLT we also preserve the original input
binary before BOLT is applied. This original file is restored if
BOLT runs again. This greatly simplifies repeated BOLT invocations,
as make doesn't perform needless work. However, this is all best
effort, as it is possible for some make target evaluations to still
get things in a bad state.

Other Remarks
=============

If this change perturbs any bugs, they are likely around cleaning
behavior. The cleaning rules are a bit complicated and not clearly
documented. And I'm unsure which targets CPython developers often
iterate on. It is highly possible that state cleanup of PGO and/or
BOLT files isn't as robust as it needs to be.

I explicitly deleted some calls to PGO cleanup because those calls
prevented no-op `make [all]` from working. It is certainly possible
something somewhere (release automation?) relied on these files being
deleted when they no longer are. We still have targets to purge profile
files and it should be trivial to add these to appropriate make rules.
This allows easily customizing the arguments to `llvm-bolt` without
having to edit the Makefile.

Arguments can be passed to configure and are reflected in configure
output, which can be useful for log analysis.

When defined in the Makefile we use `?=` syntax so the flags can be
overridden via `make VAR=VALUE` syntax or via environment variables.
Super useful for iterating on different BOLT flags.
This ensures `LD_LIBRARY_PATH` is set. Without this, I was able to
tickle a libpythonX.Y.so not found error.
Before, we only supported running BOLT on the main `python` binary.
If a shared library was in play, it wouldn't be optimized. That was
leaving a ton of optimization opportunities on the floor.

This commit adds support for running BOLT on libpython.

Functionality is disabled by default because BOLT asserts on LLVM 15,
which is the latest LLVM. I've built LLVM tip and it is able to
process libpython just fine. So it is known to work.
The generated sysconfig data during builds encodes a PEP-425
platform tag. During native (target=host) builds, the bootstrapped
compiler runs Python code in sysconfig to derive an appropriate value.

For cross compiles, we fall back to logic in configure (that code
lives around the changed lines) to derive an appropriate platform
tag, which is exported as an environment variable during builds.
And there is a "backdoor" in `sysconfig.py` that causes
`sysconfig.get_platform()` to return that value.

The logic in configure for deriving an appropriate platform tag
is a far cry from what's in `sysconfig.py`. Ideally that logic
would be fully (re)implemented in configure. But that's a
non-trivial amount of work.

Recognizing that configure makes inadequate platform tag decisions
during cross-compiles, this commit switches `_PYTHON_HOST_PLATFORM`
from a regular output variable to a "precious variable" (in autoconf
speak). This has the side-effect of allowing invokers to define the
variable, effectively allowing them to explicitly set the platform
tag during builds to a correct value when configure otherwise wouldn't
set one.
indygreg added a commit to indygreg/cpython that referenced this pull request May 20, 2023
(This change is a quick and dirty way to merge some of the build system
improvements I'm proposing in pythongh-101093 before the 3.12 feature freeze.
I wanted to scope bloat myself to fix some longstanding deficiencies in
the build system around profile-guided builds. But I'm getting soft
resistance to the reviews so close to the freeze deadline and it is
obvious that we need a simpler solution to hit the 3.12 deadline. While
this change is quick and dirty, it attempts to not make things worse.)

Before this change, we only applied bolt to the main python binary.
After this change, we apply bolt to libpython if it is configured. In
shared library builds, most of the C code is in libpython so it is
critical to apply bolt to libpython to realize bolt benefits.

This change also reworks how bolt instrumentation is applied. It
effectively removes the readelf based logic added in pythongh-101525 and
replaces it with a mechanism that saves a copy of the pre-bolt binary
and restores that copy when necessary. This allows us to perform
bolt optimizations without having to manually delete the output binary
to force a new bolt run.

We also add a new make target for purging bolt files and hook it up
to `clean` so bolt state is purged when appropriate.

`.gitignore` rules have been added to ignore files related to bolt.

Before and after this refactor, `make` will no-op after a previous run.
Both versions should also share common make DAG deficiencies where
targets fail to trigger as often as they need to or can trigger
prematurely in certain scenarios. e.g. after this change you may need
to `rm profile-bolt-stamp` to force a bolt run because there aren't
appropriate non-phony targets for bolt's make target to depend on.
Fixing this is a non-trivial amount of work that will likely have to
wait until the 3.13 window.

To make it easier to iterate on custom BOLT settings, the flags to
pass to instrumentation and application are now defined in configure
and can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and
`BOLT_APPLY_FLAGS`.
indygreg added a commit to indygreg/cpython that referenced this pull request May 21, 2023
(This change is a quick and dirty way to merge some of the build system
improvements I'm proposing in pythongh-101093 before the 3.12 feature freeze.
I wanted to scope bloat myself to fix some longstanding deficiencies in
the build system around profile-guided builds. But I'm getting soft
resistance to the reviews so close to the freeze deadline and it is
obvious that we need a simpler solution to hit the 3.12 deadline. While
this change is quick and dirty, it attempts to not make things worse.)

Before this change, we only applied bolt to the main python binary.
After this change, we apply bolt to libpython if it is configured. In
shared library builds, most of the C code is in libpython so it is
critical to apply bolt to libpython to realize bolt benefits.

This change also reworks how bolt instrumentation is applied. It
effectively removes the readelf based logic added in pythongh-101525 and
replaces it with a mechanism that saves a copy of the pre-bolt binary
and restores that copy when necessary. This allows us to perform
bolt optimizations without having to manually delete the output binary
to force a new bolt run.

We also add a new make target for purging bolt files and hook it up
to `clean` so bolt state is purged when appropriate.

`.gitignore` rules have been added to ignore files related to bolt.

Before and after this refactor, `make` will no-op after a previous run.
Both versions should also share common make DAG deficiencies where
targets fail to trigger as often as they need to or can trigger
prematurely in certain scenarios. e.g. after this change you may need
to `rm profile-bolt-stamp` to force a bolt run because there aren't
appropriate non-phony targets for bolt's make target to depend on.
Fixing this is a non-trivial amount of work that will likely have to
wait until the 3.13 window.

To make it easier to iterate on custom BOLT settings, the flags to
pass to instrumentation and application are now defined in configure
and can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and
`BOLT_APPLY_FLAGS`.
@erlend-aasland erlend-aasland removed the pending The issue will be closed if no feedback is provided label Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants