Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump workflow actions for node20 #21133

Merged
merged 3 commits into from
Jul 15, 2024
Merged

Conversation

kaos
Copy link
Member

@kaos kaos commented Jul 6, 2024

We started to run into issues with node16 no longer being functional on the runner, with symptoms like actions/checkout#1809

@kaos kaos added category:internal CI, fixes for not-yet-released features, etc. release-notes:not-required PR doesn't require mention in release notes labels Jul 6, 2024
@kaos
Copy link
Member Author

kaos commented Jul 6, 2024

Hmm.. so, this is still a good change, but it doesn't address the underlying issue, which seems to be node20 being broken on the linux runners we use.

Release run to build wheels using this workflow: https://github.com/pantsbuild/pants/actions/runs/9817274677/job/27108461020

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual change here looks reasonable... although I have not checked each action for breaking changes other than the upload-artifact one I already knew about.

@@ -19,6 +19,23 @@

from pants.util.strutil import softwrap

ACTION = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea to centralise these versions 👍

"setup-protoc": "arduino/setup-protoc@9b1ee5b22b0a3f1feb8c2ff99b32c89b3c3191e9",
"setup-python": "actions/setup-python@v5",
"slack-github-action": "slackapi/slack-github-action@v1.24.0",
"upload-artifact": "actions/upload-artifact@v4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At $WORK, we were tripped up by one of the breaking changes in this upgrade: https://github.com/actions/upload-artifact?tab=readme-ov-file#breaking-changes

Uploading to the same named Artifact multiple times.

Due to how Artifacts are created in this new version, it is no longer possible to upload to the same named Artifact multiple times. You must either split the uploads into multiple Artifacts with different names, or only upload once. Otherwise you will encounter an error.

Does Pants encounter this?

Maybe that is this why there's the overwrite: true added in upload_log_artifacts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that was my attempt at handling that, suspecting that we want the "latest" logs always. Other than that 🤷🏽 so figured I run some workflows manually using this version, once we're passed the other CI issues (the failing build wheels jobs).

@huonw
Copy link
Contributor

huonw commented Jul 10, 2024

Hmm.. so, this is still a good change, but it doesn't address the underlying issue, which seems to be node20 being broken on the linux runners we use.

Argh, looks like we might end up somewhat jammed by platform support.

  • The error message suggests node20 on Linux requires glibc 2.23 or newer.
  • We run these builds in a manylinux2014 container, presumably for platform support. https://peps.python.org/pep-0599/ suggest manylinux2014 => glibc 2.17.

# For manylinux compatibility, we build Linux wheels in a container rather than directly
# on the Ubuntu runner. As a result, we have custom steps here to check out
# the code, install rustup and expose Pythons.
# TODO: Apply rust caching here.
if platform == Platform.LINUX_X86_64:
container = {"image": "quay.io/pypa/manylinux2014_x86_64:latest"}
elif platform == Platform.LINUX_ARM64:
# Unfortunately Equinix do not support the CentOS 7 image on the hardware we've been
# generously given by the Works on ARM program. So we have to build in this image.
container = {
"image": "ghcr.io/pantsbuild/wheel_build_aarch64:v3-8384c5cf",
}

Thus, since 2.17 < 2.23, we're stuck...

An idea we might need to separate the "build orchestration" (and all the various actions) and the actual builds:

  • have the overall build run within a GHA-provided "default" image
  • run the actual pants commands within the compatibility image, either at the GHA level, or maybe via a docker_environment!

I'm not sure the best way to actually do this.

@kaos
Copy link
Member Author

kaos commented Jul 10, 2024

Want to hold off merging until we've tested these workflows properly, as there are breaking changes in at least the upload-artifact action.

@benjyw
Copy link
Contributor

benjyw commented Jul 10, 2024

Can we run in a new container than manylinux2014 ?

@benjyw
Copy link
Contributor

benjyw commented Jul 10, 2024

Oh, and why do we care about node versions? Is this our own JS backend, or related to various actions being written in JS?

@huonw
Copy link
Contributor

huonw commented Jul 10, 2024

Can we run in a new container than manylinux2014 ?

At the cost of reducing platform support. I believe Pants will currently runs on Linux platforms with glibc 2.17 - 2.22, but may not if we bump the image?

@kaos
Copy link
Member Author

kaos commented Jul 11, 2024

Oh, and why do we care about node versions? Is this our own JS backend, or related to various actions being written in JS?

Github Actions: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

@benjyw
Copy link
Contributor

benjyw commented Jul 11, 2024

And we're running actions inside the container? I guess we need the action to run natively but build in a container?

@kaos
Copy link
Member Author

kaos commented Jul 11, 2024

And we're running actions inside the container? I guess we need the action to run natively but build in a container?

Well, the linux wheel build jobs specify a container image, so I guess the entire job is executed in it, and thus the GH actions to clone the repo sources:

  build_wheels_linux_x86_64:
    container:
      image: quay.io/pypa/manylinux2014_x86_64:latest
    ...
  build_wheels_linux_arm64:
    container:
      image: ghcr.io/pantsbuild/wheel_build_aarch64:v3-8384c5cf
    ...

@benjyw
Copy link
Contributor

benjyw commented Jul 11, 2024

Right, but we can't continue to do that, so I guess the fix has to be something like "run the build in docker inside a native GHA job"

@huonw
Copy link
Contributor

huonw commented Jul 12, 2024

Maybe docker_environment can work for this?

@stuhood
Copy link
Member

stuhood commented Jul 12, 2024

Maybe docker_environment can work for this?

Possibly. The actual "build wheels" step is:

def build_all_wheels() -> None:
build_pants_wheels()
build_3rdparty_wheels()
install_and_test_packages(
CONSTANTS.pants_stable_version,
extra_pip_args=[
"--only-binary=:all:",
"-f",
str(CONSTANTS.deploy_3rdparty_wheel_dir / CONSTANTS.pants_stable_version),
"-f",
str(CONSTANTS.deploy_pants_wheel_dir / CONSTANTS.pants_stable_version),
],
)

... which invokes:

  • ./pants package for pantsbuild.pants and pantsbuild.pants.testutil
  • pip for thirdparty wheels

The ./pants package step should definitely be possible to run in the docker environment... running pip in there as well would require an extra wrapper script, but might be feasible too.

@tobni
Copy link
Contributor

tobni commented Jul 13, 2024

So, python_distribution does not have the wiring to use environments-preview, but that was easy to fix, tried it locally.

But wont also the native_client need to be built in the container? That's not "inside pants" atm, meaning even using docker_environment we would still be linking to glibc 2.23?

@huonw
Copy link
Contributor

huonw commented Jul 14, 2024

Ah, good catch. Maybe it's best to do some variant of docker run ... pants package ... (at least for now, to unblock releases).

@benjyw
Copy link
Contributor

benjyw commented Jul 14, 2024

Yes, the wheel is not the main problem, the engine is

@tobni
Copy link
Contributor

tobni commented Jul 14, 2024

env:
  ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true

seems practical if all we're looking for is an unblock?

link: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

To opt out of this and continue using Node16 while it is still available in the runner, you can choose to set ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true
as an ‘env’ in their workflow or as an environment variable on your runner machine. This will only work until we upgrade the runner removing Node16 later in the spring.

Edit: Doesn't seem to have any effect when applied ontop of the changes herein, maybe newer JS actions do not honor it?

Edit2: Indeed, checkout@v4 unconditionally requires node20.

@benjyw
Copy link
Contributor

benjyw commented Jul 14, 2024

Oooh yes, I would take an unblock right now.

@tobni
Copy link
Contributor

tobni commented Jul 14, 2024

My attempt at the temporary fix: #21172 @benjyw
The run: https://github.com/pantsbuild/pants/actions/runs/9929989447 (still spinning last I checked)

Edit: It passed with some warnings, but none of them seem critical.

@tobni
Copy link
Contributor

tobni commented Jul 15, 2024

Merging this and then #21172

@tobni tobni merged commit 9a70a83 into main Jul 15, 2024
28 of 31 checks passed
@tobni tobni deleted the kaos/bump-workflow-actions-for-node20 branch July 15, 2024 12:42
tobni added a commit that referenced this pull request Jul 15, 2024
Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133
tobni pushed a commit to tobni/pants that referenced this pull request Jul 22, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809
tobni added a commit to tobni/pants that referenced this pull request Jul 22, 2024
…sbuild#21172)

Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on pantsbuild#21133
@huonw huonw modified the milestones: 2.21.x, 2.20.x Jul 24, 2024
WorkerPants pushed a commit that referenced this pull request Jul 24, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809
@WorkerPants
Copy link
Member

I tried to automatically cherry-pick this change back to each relevant milestone, so that it is available in those older releases of Pants.

❌ 2.20.x

I was unable to cherry-pick this PR to 2.20.x, likely due to merge-conflicts.

Steps to Cherry-Pick locally

To resolve:

  1. (Ensure your git working directory is clean)
  2. Run the following script to reproduce the merge-conflicts:
    git fetch https://github.com/pantsbuild/pants main \
      && git fetch https://github.com/pantsbuild/pants 2.20.x \
      && git checkout -b cherry-pick-21133-to-2.20.x FETCH_HEAD \
      && git cherry-pick 9a70a83c43b7a5a56b1ca1d7a61a16cd3cb8f860
  3. Fix the merge conflicts and commit the changes
  4. Run build-support/cherry_pick/make_pr.sh "21133" "2.20.x"

Please note that I cannot re-run CI if a job fails. Please work with your PR approver(s) to re-run CI if necessary.

❌ 2.21.x

I was unable to cherry-pick this PR to 2.21.x, likely due to merge-conflicts.

Steps to Cherry-Pick locally

To resolve:

  1. (Ensure your git working directory is clean)
  2. Run the following script to reproduce the merge-conflicts:
    git fetch https://github.com/pantsbuild/pants main \
      && git fetch https://github.com/pantsbuild/pants 2.21.x \
      && git checkout -b cherry-pick-21133-to-2.21.x FETCH_HEAD \
      && git cherry-pick 9a70a83c43b7a5a56b1ca1d7a61a16cd3cb8f860
  3. Fix the merge conflicts and commit the changes
  4. Run build-support/cherry_pick/make_pr.sh "21133" "2.21.x"

Please note that I cannot re-run CI if a job fails. Please work with your PR approver(s) to re-run CI if necessary.

✔️ 2.22.x

Successfully opened #21198.


When you're done manually cherry-picking, please remove the needs-cherrypick label on this PR.

Thanks again for your contributions!

🤖 Beep Boop here's my run link

@WorkerPants WorkerPants added the auto-cherry-picking-failed Auto Cherry-Picking Failed label Jul 24, 2024
huonw pushed a commit that referenced this pull request Jul 25, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809

Co-authored-by: Andreas Stenius <git@astekk.se>
huonw pushed a commit that referenced this pull request Jul 25, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809
huonw pushed a commit that referenced this pull request Jul 25, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809
huonw added a commit that referenced this pull request Jul 25, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809

Co-authored-by: Andreas Stenius <git@astekk.se>
huonw added a commit that referenced this pull request Jul 25, 2024
We started to run into issues with node16 no longer being functional on
the runner, with symptoms like
actions/checkout#1809

Co-authored-by: Andreas Stenius <git@astekk.se>
WorkerPants pushed a commit that referenced this pull request Jul 25, 2024
Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133
huonw pushed a commit that referenced this pull request Jul 25, 2024
…ry-pick of #21172) (#21209)

Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133

Co-authored-by: Tobias Nilsson <tobias.jens.nilsson@gmail.com>
huonw pushed a commit that referenced this pull request Jul 26, 2024
Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133
huonw pushed a commit that referenced this pull request Jul 26, 2024
Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133
huonw added a commit that referenced this pull request Jul 26, 2024
…ry-pick of #21172) (#21211)

Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133

Co-authored-by: Tobias Nilsson <tobias.jens.nilsson@gmail.com>
huonw added a commit that referenced this pull request Jul 26, 2024
…ry-pick of #21172) (#21212)

Renders github actions and env settings to continue to use node16 inside
the containers that match the glibc (2.17) of manylinux_2014 builder
images. The new default node version for gha requires a much higher
version.

I checked if https://github.com/nodejs/unofficial-builds had node
binaries we could use. It does not.

Based on #21133

Co-authored-by: Tobias Nilsson <tobias.jens.nilsson@gmail.com>
cburroughs added a commit that referenced this pull request Nov 16, 2024
While trying to land #21528 to was observed that the wheel building jobs
were failing during git checkout with wonderful errors like
`/__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found
(required by /__e/node20/bin/node)`. This appears to be the same
symptoms as actions/checkout#1809 which
pointed the the deprecation notice at
<https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/>.
If this all sounds familiar that is because we *mostly* cleaned this up
in #21133, but kept a fewer uses of the older actions for
`manylinux2014` compatibility. However, from the notice:

"To opt out of this and continue using Node16 while it is still
available in the runner, you can choose to set
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true
as an ‘env’ in their workflow or as an environment variable on your
runner machine. This will *only work until we upgrade the runner
removing Node16 later in the spring*. (emphasis added)"

From the wheel job failures during #21528 and my attempts to fiddle with
all of `ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION`,
`ACTIONS_RUNNER_FORCED_INTERNAL_NODE_VERSION`,
`ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION` I think the removal time
promised for the Spring has finally come, but I'm not that familiar with
GitHub Actions and could be missing something.

As a consequence the version of both `actions/upload-artifacts` and
`actions/checkout` is bumped to `v4`. To make this testable now and in
the future, wheels are now build on ci config changes.

Why is bumping to `manylinux_2_28` (the next oldest manylinux) not a big
deal? The "2.28" refers to glibc
[2.28](https://sourceware.org/legacy-ml/libc-alpha/2018-08/msg00003.html)
in 2018. That is older than Debian stable and if you need to use
software that old you are presumably paying someone for an enterprise
distribution.

NOTE: I think we should bump the manylinux version in `main` regardless,
but if my read if the situation is correct we may need to backport this
any active release line. Without this or a more complex change we would
also be unable to release after December 5th per #21616.

ref #21195 #21616
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-cherry-picking-failed Auto Cherry-Picking Failed category:internal CI, fixes for not-yet-released features, etc. release-notes:not-required PR doesn't require mention in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants