-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Various build system improvements #101093
base: main
Are you sure you want to change the base?
Conversation
Due to the holiday weekend in Korea (Lunar new year), I will review relevant PRs by next weekend. |
cc3f6f5
to
2e2ba00
Compare
Sidenote: BOLT now supports DWARF5, so there's no need to force https://github.com/python/cpython/blob/main/configure.ac#L1950-1952 |
Nice news! Thanks for let us know :) @indygreg, Let's drop the flag cc @aaupov |
2e2ba00
to
fe8e296
Compare
No rush on the reviews, @corona10: I'm also busy celebrating the Lunar New Year this weekend. Some specific guidance I could use help with is how to split up / fold commits and how to associate them with a GitHub Issue. To be honest I find the whole CPython contribution process of managing GitHub Issues in addition to a PR to be confusing. Is there a 1:1 relationship? 1:n? https://devguide.python.org/ is a bit ambiguous on this topic. I'm lazy/busy, so prefer as little overhead / separate GitHub Issues as possible. But I can go along with whatever. If someone tells me what to do I'll do it. Also, work that I'd like to do but haven't started working on yet is enabling some of The end state I'd like to achieve is the ability for CPython's build system to build ~all extensions statically with minimal effort. e.g. |
One more thing, I have also interested in supporting build all extensions statically with minimal effort for internal purposes. |
What wrong with just editing |
@indygreg, are you planning on following up this PR? |
I could probably find time to polish off the PGO/BOLT changes if still wanted. Let's treat the broader extension module pieces I referred to as out of scope. BTW, one of the things the stack of commits in this PR does is it adds BOLT optimization to libpython. Current |
It would be really helpful if you could split this up in separate issues for each isolated change, and for each issue write a few sentences about why the change is needed, what value it adds, etc. Applying BOLT to libpython makes a lot of sense to me (based on your analysis), but I have no experience when it comes to BOLT :) |
It would be really helpful if you could split this up in separate issues for each isolated change, and for each issue write a few sentences about why the change is needed, what value it adds, etc.
Yup. The PR summary called out that I was awaiting instructions/feedback on whether/how to proceed. I wasn’t sure if people were even interested in the refactors.
All the details/justification are there in the commit messages. I just need to put in the time to turn the commits into issues/PRs. Maybe I’ll start later today…
|
That's great, but it is easier to interact using GitHub Issues as opposed to comments on single commits. |
ff74a68
to
c63a03d
Compare
First 2 PRs from this branch are up:
Need to wait on publishing the next one because it depends on #104491 and I reckon GitHub's already brittle interface for stacking PRs will fall apart with all the automation CPython is using. So I'm not even going to try to do it. |
Various rules were only ever invoked once and had minimal bodies. I don't see a benefit to the indirection. So this commit inlines rules to simplify the PGO logic. skip news
This commit overhauls the make-based build system's rules for building optimized binaries. Along the way it fixes a myriad of bugs and shortcomings with the prior approach. The old way of producing optimized binaries had various limitations: * `make [all]` would do work when PGO was enabled because the phony `profile-opt` rule was non-empty. This prevented no-op PGO builds from working at all. This meant workflows like `make; make install` either incurred extra work or failed due to race conditions. * Same thing for BOLT, as its `bolt-opt` rule was also non-empty and always ran during `make [all]`. * BOLT could not be run multiple times without a full rebuild because `llvm-bolt` can't instrument binaries that have already received BOLT optimizations. * It was difficult to run BOLT on its own because of how various make targets and their dependencies were structured. * I found the old way that configure and make communicated the default targets to be confusing and hard to understand. There are essentially 2 major changes going on in this commit: 1. A rework of the high-level make targets for performing a build and how they are defined. 2. A rework of all the make logic related to profile-based optimization (read: PGO and BOLT). Build Target Rework ====================== Before, we essentially had `build_all`, `profile-opt`, `bolt-opt` and `build_wasm` as our 3 targets for performing a build. `all` would alias to one of these, as appropriate. And there was another definition for which _simple_ make target to evaluate for non-optimized builds. This was likely `build_all` or `all`. In the rework, we introduce 2 new high-level targets: * `build-plain` - Perform a build without optimizations. * `build-optimized` - Perform a build with optimizations. `build-plain` is aliased to `build_all` in all configurations except WASM, where it is `build_wasm`. `build-optimized` by default is aliased to a target that prints an error message when optimizations aren't enabled. If PGO or BOLT are enabled, it is aliased to their respective target. `build-optimized` is the logical successor to `profile-opt`. I felt it best to delete `profile-opt` completely, as the new `build-*` high-level targets feel more friendly to use. But if people lament its loss, we can add a `profile-opt: build-optimized` to achieve almost the same result. Profiled-Based Optimization Rework ================================== Most of the make logic related to profile-based optimization (read: PGO and BOLT) has been touched in this change. A major issue with the old way of doing things was we used phony, always-executed make rules. This is a bad practice in make because it undermines no-op builds. Another issue is that the separation between the rules and what order they ran in wasn't always clear. Both PGO and BOLT consist of the same 4 phase solution: instrument, run, analyze, and apply. However, these steps weren't clearly expressed in the make logic. This is especially true for BOLT, which only had 1 make rule. Another issue with BOLT is that it was really easy to get things into a bad state. e.g. if you applied BOLT to `pythonX.Y` you could not run BOLT again unless you rebuilt `pythonX.Y` from source. In the new world, we have separate `profile-<tool>-<stage>-stamp` rules defining the 4 distinct `instrument`, `run`, `analyze`, and `apply` stages for both PGO and BOLT. Each of these stages is tracked by a _stamp_ semaphore file so progress can be captured. This should all be pretty straightforward. There is some minimal complexity here to handle BOLT's optional dependency on PGO, as BOLT either depends on `build_all` or `profile-pgo-apply-stamp`. As part of the refactor to BOLT we also preserve the original input binary before BOLT is applied. This original file is restored if BOLT runs again. This greatly simplifies repeated BOLT invocations, as make doesn't perform needless work. However, this is all best effort, as it is possible for some make target evaluations to still get things in a bad state. Other Remarks ============= If this change perturbs any bugs, they are likely around cleaning behavior. The cleaning rules are a bit complicated and not clearly documented. And I'm unsure which targets CPython developers often iterate on. It is highly possible that state cleanup of PGO and/or BOLT files isn't as robust as it needs to be. I explicitly deleted some calls to PGO cleanup because those calls prevented no-op `make [all]` from working. It is certainly possible something somewhere (release automation?) relied on these files being deleted when they no longer are. We still have targets to purge profile files and it should be trivial to add these to appropriate make rules.
This allows easily customizing the arguments to `llvm-bolt` without having to edit the Makefile. Arguments can be passed to configure and are reflected in configure output, which can be useful for log analysis. When defined in the Makefile we use `?=` syntax so the flags can be overridden via `make VAR=VALUE` syntax or via environment variables. Super useful for iterating on different BOLT flags.
This ensures `LD_LIBRARY_PATH` is set. Without this, I was able to tickle a libpythonX.Y.so not found error.
Before, we only supported running BOLT on the main `python` binary. If a shared library was in play, it wouldn't be optimized. That was leaving a ton of optimization opportunities on the floor. This commit adds support for running BOLT on libpython. Functionality is disabled by default because BOLT asserts on LLVM 15, which is the latest LLVM. I've built LLVM tip and it is able to process libpython just fine. So it is known to work.
The generated sysconfig data during builds encodes a PEP-425 platform tag. During native (target=host) builds, the bootstrapped compiler runs Python code in sysconfig to derive an appropriate value. For cross compiles, we fall back to logic in configure (that code lives around the changed lines) to derive an appropriate platform tag, which is exported as an environment variable during builds. And there is a "backdoor" in `sysconfig.py` that causes `sysconfig.get_platform()` to return that value. The logic in configure for deriving an appropriate platform tag is a far cry from what's in `sysconfig.py`. Ideally that logic would be fully (re)implemented in configure. But that's a non-trivial amount of work. Recognizing that configure makes inadequate platform tag decisions during cross-compiles, this commit switches `_PYTHON_HOST_PLATFORM` from a regular output variable to a "precious variable" (in autoconf speak). This has the side-effect of allowing invokers to define the variable, effectively allowing them to explicitly set the platform tag during builds to a correct value when configure otherwise wouldn't set one.
c63a03d
to
9e50ee0
Compare
(This change is a quick and dirty way to merge some of the build system improvements I'm proposing in pythongh-101093 before the 3.12 feature freeze. I wanted to scope bloat myself to fix some longstanding deficiencies in the build system around profile-guided builds. But I'm getting soft resistance to the reviews so close to the freeze deadline and it is obvious that we need a simpler solution to hit the 3.12 deadline. While this change is quick and dirty, it attempts to not make things worse.) Before this change, we only applied bolt to the main python binary. After this change, we apply bolt to libpython if it is configured. In shared library builds, most of the C code is in libpython so it is critical to apply bolt to libpython to realize bolt benefits. This change also reworks how bolt instrumentation is applied. It effectively removes the readelf based logic added in pythongh-101525 and replaces it with a mechanism that saves a copy of the pre-bolt binary and restores that copy when necessary. This allows us to perform bolt optimizations without having to manually delete the output binary to force a new bolt run. We also add a new make target for purging bolt files and hook it up to `clean` so bolt state is purged when appropriate. `.gitignore` rules have been added to ignore files related to bolt. Before and after this refactor, `make` will no-op after a previous run. Both versions should also share common make DAG deficiencies where targets fail to trigger as often as they need to or can trigger prematurely in certain scenarios. e.g. after this change you may need to `rm profile-bolt-stamp` to force a bolt run because there aren't appropriate non-phony targets for bolt's make target to depend on. Fixing this is a non-trivial amount of work that will likely have to wait until the 3.13 window. To make it easier to iterate on custom BOLT settings, the flags to pass to instrumentation and application are now defined in configure and can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and `BOLT_APPLY_FLAGS`.
(This change is a quick and dirty way to merge some of the build system improvements I'm proposing in pythongh-101093 before the 3.12 feature freeze. I wanted to scope bloat myself to fix some longstanding deficiencies in the build system around profile-guided builds. But I'm getting soft resistance to the reviews so close to the freeze deadline and it is obvious that we need a simpler solution to hit the 3.12 deadline. While this change is quick and dirty, it attempts to not make things worse.) Before this change, we only applied bolt to the main python binary. After this change, we apply bolt to libpython if it is configured. In shared library builds, most of the C code is in libpython so it is critical to apply bolt to libpython to realize bolt benefits. This change also reworks how bolt instrumentation is applied. It effectively removes the readelf based logic added in pythongh-101525 and replaces it with a mechanism that saves a copy of the pre-bolt binary and restores that copy when necessary. This allows us to perform bolt optimizations without having to manually delete the output binary to force a new bolt run. We also add a new make target for purging bolt files and hook it up to `clean` so bolt state is purged when appropriate. `.gitignore` rules have been added to ignore files related to bolt. Before and after this refactor, `make` will no-op after a previous run. Both versions should also share common make DAG deficiencies where targets fail to trigger as often as they need to or can trigger prematurely in certain scenarios. e.g. after this change you may need to `rm profile-bolt-stamp` to force a bolt run because there aren't appropriate non-phony targets for bolt's make target to depend on. Fixing this is a non-trivial amount of work that will likely have to wait until the 3.13 window. To make it easier to iterate on custom BOLT settings, the flags to pass to instrumentation and application are now defined in configure and can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and `BOLT_APPLY_FLAGS`.
I'm creating this PR to track various build system changes that I'd like to make to CPython.
I'm creating a draft PR instead of opening real PRs because general OSS experience has taught me that if I trickle things out slowly without any prior context project maintainers won't see the big picture and/or benefits of the changes. My hope is that by tracking all proposed changes in a Git branch that people can easily diff that I'll have an easier time convincing folks to accept the patches.
Next Steps
CPython maintainer(s) should look at each commit in this branch. I write detailed commit messages. See each commit for more details.
Commits beginning with
gh-xxxx:
aren't yet assigned issues/PRs.Then, just tell me which commits to turn into issues/PRs and I'll do so. Keep in mind some commits are dependent on others. For example, the BOLT enhancements (which appear to deliver meaningful pyperformance wins) depends on the
overhaul build rules for optimized binaries
commit and its precursors.I have several more potential patches to include in this branch. Including a bunch that enable cross-compiling for Apple platforms. I figured I'd start with just the build system + BOLT improvements to get people's attention.
Most of this work is derived from my efforts on https://github.com/indygreg/python-build-standalone.
cc @corona10 since you seem to have an interest in all the BOLT PRs.