Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel docbuild, cythonization: Use GNU make's POSIX jobserver protocol #30369

Closed
mkoeppe opened this issue Aug 15, 2020 · 15 comments · Fixed by #36640
Closed

Parallel docbuild, cythonization: Use GNU make's POSIX jobserver protocol #30369

mkoeppe opened this issue Aug 15, 2020 · 15 comments · Fixed by #36640

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Aug 15, 2020

The correct way out of the mess with recommending MAKE='make -j8' instead of make -j8 (which we do because of the outdated GNU make 3.81 on macOS - see #21610, #30345) is to use GNU make's POSIX job server protocol - https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html

  • Create a wrapper script in Python that tries to acquire >= A, <= B jobs from the jobserver within <= T wall clock seconds, sets the environment variable SAGE_NUM_THREADS to min(C, NUM_JOBS_RECEIVED), and invokes an arbitrary script.

  • Use this script as a wrapper for the sagelib build and docbuild.

  • Replace the recommendation MAKE='make -j8' make by the familiar make -j8.

  • In the top-level Makefile, filter out -j so that also users who continue to follow the old recommendation MAKE='make -j8' make benefit from the improvements, and to get rid of the "disabling jobserver" messages.

CC: @dimpase @orlitzky @jhpalmieri @Etn40ff

Component: build

Issue created by migration from https://trac.sagemath.org/ticket/30369

@mkoeppe mkoeppe added this to the sage-9.2 milestone Aug 15, 2020
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 15, 2020

@orlitzky
Copy link
Contributor

comment:2

Copying my comment from #30345:

We're having trouble here because we're conflating two different things. The sage distribution has a top-level makefile that's used to initiate high-level tasks -- in particular installing SPKGs -- convenient to execute. Many of those high-level tasks themselves have build systems, separate from sage, that involve Make. We're passing the distribution's make flags down to the individual build systems and trying to coordinate the two as if it was one large build system, but it really isn't.

When in doubt, copy Gentoo: our package manager has --jobs and --load-average arguments that control how many package installation jobs get executed at once. We then also have a MAKEFLAGS variable that is passed to the individual build systems. (A better name would be something like BUILDJOBS, but the name is historical.) Sage's distribution Makefile can likely always be invoked with -j1 with no performance penalty; it's the individual build systems that need something like -j8.

So, my suggestion: we copy Gentoo, and create something like a SAGE_BUILD_JOBS variable that defaults to 4 (everyone has at least dual-core by now?). We can then document that the top-level make need not be invoked with -j<anything>, and within that top-level Makefile, we would pass -j${SAGE_BUILD_JOBS} to any Make-based SPKG build systems. If we encounter a non-Make build system, we can then still do what the user asked, and pass whatever flags are necessary to run the build with ${SAGE_BUILD_JOBS} threads.

I think that's less complicated that trying to coordinate the two levels, and doesn't wed us further to GNU's implementation of Make.

@dimpase
Copy link
Member

dimpase commented Aug 17, 2020

comment:3

IIRC, Sage cannot be built with a nonGnu make, anyway.

IMHO Gentoo's way looks a bit like a workaround for old buggy Gnu Make.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 17, 2020

comment:4

Replying to @orlitzky:

Copying my comment from #30345:

We're having trouble here because we're conflating two different things. The sage distribution has a top-level makefile that's used to initiate high-level tasks -- in particular installing SPKGs -- convenient to execute. Many of those high-level tasks themselves have build systems, separate from sage, that involve Make. We're passing the distribution's make flags down to the individual build systems and trying to coordinate the two as if it was one large build system, but it really isn't.

No, the point of the jobserver is that one does NOT have to distinguish these "two different things". The jobserver manages the load.

We do not have a problem with the jobserver. We are careful to pass down the make options to sub-makes. This works.

We have a problem with the python-based build systems that do not use the jobserver, and this ticket proposes a specific way to solve it.

@orlitzky
Copy link
Contributor

comment:5

Replying to @dimpase:

IIRC, Sage cannot be built with a nonGnu make, anyway.

The Makefiles generated by autotools are POSIX compatible, it's only the hand-written Makefiles that are a problem. We generate those with a bootstrap script anyway, so at least in theory it would be possible to coerce them into being POSIX compatible.

IMHO Gentoo's way looks a bit like a workaround for old buggy Gnu Make.

It's just a way to let users specify how many jobs should be used to build each package in a source-based distribution. The number of packages that will be installed simultaneously is something else. Our situation isn't much different, except that the "package manager" is also Make.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 17, 2020

comment:6

Replying to @orlitzky:

It's just a way to let users specify how many jobs should be used to build each package in a source-based distribution. The number of packages that will be installed simultaneously is something else. Our situation isn't much different, except that the "package manager" is also Make.

Users should have no reason to be interested in this distinction. All that matters is throughput.

@orlitzky
Copy link
Contributor

comment:7

Replying to @mkoeppe:

No, the point of the jobserver is that one does NOT have to distinguish these "two different things". The jobserver manages the load.

You can certainly integrate the two cleanly with the jobserver approach, but that doesn't change my opinion that they're conceptually two different quantities. The first is how many high-level tasks I'd like to run at the same time (doc building, package installation), and the second is how many threads I'd like to use to perform each of those tasks.

Users should have no reason to be interested in this distinction. All that matters is throughput.

Maybe not, but we could default both to the same number to accomplish that out-of-the-box. Experience has shown that most people are happy with one package installation at a time, and then a configurable number of jobs. High-level tasks are generally free, while low level ones tend to chew up CPU/RAM. What I personally care about is the number of low-level build threads, which should be the number of package-installation threads times the number of per-package-build-threads. Again using Gentoo as an example, I usually leave this at one package at a time, and then ~four build threads per package.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 17, 2020

comment:8

Replying to @orlitzky:

we could default both to the same number to accomplish that out-of-the-box. Experience has shown that most people are happy with one package installation at a time, and then a configurable number of jobs.

Sorry, this does not match the typical experience in Sage at all. Throughput comes from building many packages in parallel.

@orlitzky
Copy link
Contributor

comment:9

One other thing that distinguishes Gentoo and sage-the-distribution is that Sage is fortunate enough to only have C, C++, Fortran, and python packages for now. Some day that will probably change, and then it's a lot simpler to pass a flag to whatever newfangled build system is involved than it would be to integrate that build system with the GNU Make job server.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 17, 2020

comment:10

Replying to @orlitzky:

.... whatever newfangled build system ...

such as ninja? Note https://github.com/Kitware/ninja/releases, https://forums.gentoo.org/viewtopic-t-1110292-view-previous.html

@orlitzky
Copy link
Contributor

comment:11

Replying to @mkoeppe:

Replying to @orlitzky:

.... whatever newfangled build system ...

such as ninja? Note https://github.com/Kitware/ninja/releases, https://forums.gentoo.org/viewtopic-t-1110292-view-previous.html

I actually had Meson in mind, but this is also a good example. Utilizing the job server has the potential to be more efficient, if e.g. you're only able to build one package at a time due to a bunch of linear dependencies (in that case, you're getting 1*N instead of M*N jobs where M is the number of jobs you want your package manager to execute in parallel). But as with ninja, the trade-off is that someone would have to write and maintain a huge patch to the build system to support the jobserver protocol, and then it only works with GNU make. Without the patch, we pass e.g. -j8 to ninja (just like with Make) and it works fine. Maybe slightly less efficient, but a lot less complicated.

@orlitzky
Copy link
Contributor

comment:12

And actually, I knew that Meson didn't use Make as its backend, but... it uses Ninja! So they're the same example.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 17, 2020

comment:13

Anyway, thanks for the discussion. It is good that we are aware of the longer term context. We'll see in which direction things move (I predict that more build systems embrace the job server protocol, or improvements of it).

To return to the technical point of this ticket: I propose to create a wrapper script in Python that tries to acquire >= A, <= B jobs from the jobserver within <= T wall clock seconds, sets the environment variable SAGE_NUM_THREADS to min(C, NUM_JOBS_RECEIVED), and invokes an arbitrary script. Using this for the sagelib build and docbuild will allow us to replace the recommendation MAKE='make -j8' make by the familiar make -j8, and will likely be an improvement over the status quo in terms of load management.

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Sep 5, 2020
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 13, 2021

comment:17

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Feb 13, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.4, sage-9.5 Jul 19, 2021
@mkoeppe mkoeppe removed this from the sage-9.5 milestone Sep 14, 2021
vbraun pushed a commit to vbraun/sage that referenced this issue Dec 10, 2023
…sm of doctesting

    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

This is useful when running our doctester in parallel with other build
steps, or several doctesters in parallel, as happens for example in
`make SAGE_CHECK=yes pypi-wheels`, and more of that after sagemath#35095.

To test:
```
MAKE="make -j14" make SAGE_NUM_THREADS=100 DEBUG_JOBCLIENT=1 ptest
```
This will make the doctester attempt to use 100 workers, but it will
only get tokens for 14 workers from `make`.
`DEBUG_JOBCLIENT=1` shows what's happening.

Upstream PR:
- milahu/gnumake-tokenpool#3 (merged)

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
- Resolves sagemath#30369
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36640
Reported by: Matthias Köppe
Reviewer(s): Michael Orlitzky
@vbraun vbraun closed this as completed in 77fbf9a Dec 14, 2023
@mkoeppe mkoeppe added this to the sage-10.3 milestone Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants