Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3: enable optimizations. #43791

Closed

Conversation

triallax
Copy link
Contributor

@triallax triallax commented May 7, 2023

This --enable-optimizations flag enables profile-guided optimizations (PGO) and link-time optimization (LTO, but apparently only sometimes, depending on the platform). My rudimentary benchmarking indicates that compared to python3 from the repos, python3 built from this PR yielded performance improvements of around 10%-12% (I tried building python3 from master locally, but that one had horrendous performance for some reason). I'm not a benchmarking expert though, so I wanted others to try out their own benchmarks and post their results here.

On my potato laptop, this change tripled build time from 5 minutes to 15 minutes. I don't think this is an issue however because changes to the python3 template are not done often (averaging once per month in the last 12 months), and the builders are much beefier than my laptop, so their build times will be even shorter anyway.

TODO:

  • Is this also needed for python3-tkinter? (I don't think so, but better check)
  • Are there similar performance improvements with cross builds? (PGO runs and profiles a Python that is compiled for the host architecture, which is why I want to confirm this)

Testing the changes

  • I tested the changes in this PR: YES (only python3, not python3-tkinter)

@ahesford

@ahesford
Copy link
Member

Sorry for the delay. I don't really have a problem with enabling optimizations; if they improve runtime performance, we can pay the price in compile time. The question is how much PGO makes improves things for the average user if the optimization is tailored to the build system. I'd like to test a few things, including building on different CPUs to see how well the optimizations translate.

@tornaria
Copy link
Contributor

Sorry for the delay. I don't really have a problem with enabling optimizations; if they improve runtime performance, we can pay the price in compile time. The question is how much PGO makes improves things for the average user if the optimization is tailored to the build system. I'd like to test a few things, including building on different CPUs to see how well the optimizations translate.

As a data point, I built this on my box (i7-9700). Running the sagemath testsuite with this (in the same box) takes ~5% less time to complete. As in 12:00 vs 12:30 for the normal test, and 23:17 vs 24:40 for the long test using 8 threads.

@triallax triallax force-pushed the python3-enable-optimizations-flag branch from ee7cb61 to eb0b729 Compare August 8, 2023 18:40
@ahesford
Copy link
Member

I built this PR on a Zen 3 CPU and ran a test that built a sparse finite-element matrix and then used SuperLU to performan LU factorization. Although the SuperLU performance won't change, the setup does some heavy looping in Python and the overall time for the routine dropped from 2.78 sec to 2.3 sec, each trial averaged over several runs. I installed the Zen-built package on another system with a 9th-generation Intel CPU and re-ran the same test, with the times dropping from 3.7 sec to 2.75 sec.

It seems these optimizations offer tangible benefits, and they translate to different systems.

@ahesford ahesford closed this in ff915f0 Aug 13, 2023
@triallax triallax deleted the python3-enable-optimizations-flag branch August 13, 2023 13:15
@tornaria
Copy link
Contributor

tornaria commented Aug 14, 2023

Just for the record, the sagemath (10.1.rc0) testsuite with python3-3.11.4_1:

Total time for all tests: 843.5 seconds
    cpu time: 5039.4 seconds
    cumulative wall time: 5700.8 seconds

After updating to python3-3.11.4_2 (no other change, no recompilation):

Total time for all tests: 790.8 seconds
    cpu time: 4601.8 seconds
    cumulative wall time: 5175.3 seconds

That's a neat 5-10 % speedup! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants