Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install-plugins.sh unexpectedly removed from Docker images prior to 2.357 #1415

Closed
MarkEWaite opened this issue Jul 1, 2022 · 16 comments
Closed
Assignees
Labels

Comments

@MarkEWaite
Copy link
Contributor

MarkEWaite commented Jul 1, 2022

Jenkins and plugins versions report

Environment
Docker image for Jenkins 2.332.3 on my computer before a Docker pull:

jenkins/jenkins                                                2.332.3                  73264ace1394   8 weeks ago     464MB

$ docker run --rm -i -t jenkins/jenkins:2.332.3 wc -l /usr/local/bin/install-plugins.sh
292 /usr/local/bin/install-plugins.sh

Docker image for Jenkins 2.332.3 on my computer after a Docker pull:

jenkins/jenkins                                                2.332.3                  03476fb69d1a   5 days ago      464MB

$ docker run --rm -i -t jenkins/jenkins:2.332.3 wc -l /usr/local/bin/install-plugins.sh
4 /usr/local/bin/install-plugins.sh

What Operating System are you using (both controller, and any agents involved in the problem)?

Ubuntu 20.04 and Debian testing

Reproduction steps

  1. Download original Docker image for Jenkins 2.332.3 (or other of the most recent 10+ releases)
  2. List the contents of /usr/local/bin/install-plugins.sh, confirm it is the expected shell script
  3. Use docker pull to update the Docker image
  4. List the contents of /usr/local/bin/install-plugins.sh, confirm the contents of the script have been replaced

Expected Results

The 292 line shell script should have remained in the previous Docker images

Actual Results

4 line replacement script is in the Docker image instead of the original script

Anything else?

The original install-plugins.sh script can be downloaded from https://github.com/jenkinsci/docker/raw/0e8271bf693bddcaa76cfdedb8ef5d8ae940b859/install-plugins.sh for those who need an immediate replacement.

The functionality of the install-plugins.sh script is available from the /bin/jenkins-plugin-cli program. It uses the plugin installation manager tool to more accurately resolve dependencies, report security issues, and manage plugin versions.

@MarkEWaite MarkEWaite added the bug label Jul 1, 2022
@dduportal dduportal self-assigned this Jul 2, 2022
@dduportal
Copy link
Contributor

First step: identifying the root cause. As shared in the Gitter channel "jenkinsci/docker" yesterday:

  • After the pull request Adapt to jenkinsci/jenkins#6602 by removing -preview from Java 17 tags #1399 was merged, a deployment build was triggered as usual (it is a private Jenkins instance so no public link)
  • This build (Build 7238) detected the versions 2.329, 2.330, 2.331, 2.332 and 2.332.2 as "not published" started rebuilding / pushing them but failed at a certain point in time due to a transient network error (DNS resolution failure):
04:56:48  WARNING: could not get a valid manifest at the URL https://index.docker.io/v2/jenkins/jenkins/manifests/2.329-alpine-jdk17.
04:56:48    (For debugging purposes) manifest={"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":"unknown tag=2.329-alpine-jdk17"}]}
04:56:48  2.329 not published yet
04:56:48  Version 2.329 higher than 1.0: publishing 2.329 latest_weekly:false latest_lts:false
# ...
05:01:47  WARNING: could not get a valid manifest at the URL https://index.docker.io/v2/jenkins/jenkins/manifests/2.330-alpine-jdk17.
05:01:47    (For debugging purposes) manifest={"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":"unknown tag=2.330-alpine-jdk17"}]}
05:01:47  2.330 not published yet
05:01:47  Version 2.330 higher than 1.0: publishing 2.330 latest_weekly:false latest_lts:false
# ...
05:03:25  WARNING: could not get a valid manifest at the URL https://index.docker.io/v2/jenkins/jenkins/manifests/2.331-alpine-jdk17.
05:03:25    (For debugging purposes) manifest={"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":"unknown tag=2.331-alpine-jdk17"}]}
05:03:25  2.331 not published yet
05:03:25  Version 2.331 higher than 1.0: publishing 2.331 latest_weekly:false latest_lts:false
# ...
05:05:01  WARNING: could not get a valid manifest at the URL https://index.docker.io/v2/jenkins/jenkins/manifests/2.332-alpine-jdk17.
05:05:01    (For debugging purposes) manifest={"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":"unknown tag=2.332-alpine-jdk17"}]}
05:05:01  2.332 not published yet
# ...
05:07:02  WARNING: could not get a valid manifest at the URL https://index.docker.io/v2/jenkins/jenkins/manifests/2.332.1-alpine-jdk17.
05:07:02    (For debugging purposes) manifest={"errors":[{"code":"MANIFEST_UNKNOWN","message":"manifest unknown","detail":"unknown tag=2.332.1-alpine-jdk17"}]}
05:07:02  2.332.1 not published yet
05:07:02  Version 2.332.1 higher than 1.0: publishing 2.332.1 latest_weekly:false latest_lts:false
# ...
  • The next build (7239) also failed but due to timeout. It shows the following:
    • Versions 2.329, 2.330, 2.331 and 2.332 were built and re-deployed successfully by the previous build (before failure)
10:56:49  Tag is already published: 2.329
10:56:51  Tag is already published: 2.330
10:56:54  Tag is already published: 2.331
# ...
10:56:56  Tag is already published: 2.332
  • All the other versions were marked as not published (same messages) and build.

The publish.sh script that we are using is the cause:

https://github.com/jenkinsci/docker/blob/master/.ci/publish.sh#L35 is a shell function that is called to "verify" if the version passed as first argument is considered published or need a rebuild. It returns 0 if all the tags of all image for this version have a manifest found in the DockerHub.

With this "fragile" code, merging the PR #1399 had the side-effect of making all the version tracked by this script to be marked as "not published" because the new tags introduced in the PR did not exist.

@dduportal
Copy link
Contributor

We confirm this is unexpected. The PR which removed the scripot install.plugins.sh was merged 6 days ago, so only the version 2.357 (weekly release published the 28th of July 2022) should have been impacted by the change.

@dduportal
Copy link
Contributor

Proposal: rebuilding all the images that were rebuilt (range from 2.329 until 2.356 included , with both 2.332.x and 2.346.x lines) with the code prior to #1408 (commit a9de6b9):

  • That would revert back this change for these images
  • That should be easy and quick to do "1 time"
  • BUT the changes published on this repository between each version's publication date and this commit would be embedded.

@timja @MarkEWaite @basil @jeremycornett (and any other user) would that be good for you?

@MarkEWaite
Copy link
Contributor Author

@dduportal thanks for the detailed analysis. I like that plan very much. I'm not clear how to do that image rebuild 1 time, but I trust that you have a way to do it that gives you the confidence that it will work.

@dduportal
Copy link
Contributor

dduportal commented Jul 2, 2022

@dduportal thanks for the detailed analysis. I like that plan very much. I'm not clear how to do that image rebuild 1 time, but I trust that you have a way to do it that gives you the confidence that it will work.

The proposed plan is the following:

  • Create a new branch from a9de6b9 named gh-1415
    • install-plugins.sh is present
    • JDK8 is still present as well
  • On this branch, modifiy the publish.sh script with the following changes:
    • docker/.ci/publish.sh

      Lines 169 to 171 in 3a3a801

      for version in ${versions}
      do
      TOKEN=$(login-token)
      write explicitly the list of version as an array, to control it (instead of retrieving a range dynamically)
    • Remove the condition
      if is-published "${version}" "${latest_weekly}" "${latest_lts}"
      (check if image is published) to tell the script "yes please build and deploy".
  • Then, on trusted.ci, add this branch to the multibranch job and run a build until success.
  • Once the fix is confirmed, then cleanup:
    • Remove the branch from trusted.ci
    • Tag the HEAD of this branch (to keep it in the future)
    • Remove the branch from repository

WDYT?

@MarkEWaite
Copy link
Contributor Author

WDYT?

That sounds great to me. Thanks for doing it!

dduportal added a commit that referenced this issue Jul 2, 2022
Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
@dduportal
Copy link
Contributor

Opened a PR to support the "hotfix" work with a proposal: #1416.

@dduportal
Copy link
Contributor

  • Medium term fix: we have to improve publish.sh to avoid reproducting this behavior in the close future
  • Long term fix: I assume I should bring the versioning scheme here to the platform SIG? As per the exchanges in gitter channel, I would like to bring the topic of the version numbering for the Docker images. Rebuilding tags happened multiple time during the past weeks and caught a lot of end users off-guard. The idea of providing (git) tags for this image, mapped to Docker tags would benefit a lot of users. We could keep building "lines" such as lts-jdk11 or 2.332.1 that would follow the "latest" pattern, but also introducing tags (no preferences on the pattern: 20220701-01h00Z-2.332.1, 1.0.0-2.332.1, 7456-2.567-jdk17 , whatever) that are NEVER rebuilt would also help.

Any ideas/objections/tips?

@timja
Copy link
Member

timja commented Jul 2, 2022

A simpler change would be to not do the magic version rebuild and just do the latest version and allow passing a parameter or so for LTS.

There’s an orthogonal question on image patching as generally it’s somewhat expected that image tags still get OS patching but somewhat complicated in our current setup

@dduportal
Copy link
Contributor

A simpler change would be to not do the magic version rebuild and just do the latest version and allow passing a parameter or so for LTS.

That one looks doable quite easily: there is already a detection of the "latest weekly": is that what you mean by "latest" version?

For the parameter, if I understand correctly your proposal, that would mean a pipeline parameter that would say "build the LTS version passed as parameter" ?

@dduportal
Copy link
Contributor

All images were re-published 🥳

@MarkEWaite
Copy link
Contributor Author

Confirmed that the republished images contain the expected install-plugins.sh

@daniel-beck
Copy link
Member

A simpler change would be to not do the magic version rebuild and just do the latest version and allow passing a parameter or so for LTS.

Needs to also consider that we sometimes publish two LTS lines at the same time.

Out of curiosity, IIUC, it checked that all images exist and if not (necessarily the case if we add a new variant), rebuilt all of them? Could the script not just check for the "main" image (tagged 2.357) and if that's missing, do all of the images? Or check each variant individually and only rebuild what's missing?

@dduportal
Copy link
Contributor

Out of curiosity, IIUC, it checked that all images exist and if not (necessarily the case if we add a new variant), rebuilt all of them?

Almost yes: for each "version", the script check if all the tags AND all the arch for this version's images (alpine, jdk11,jdk17, arm jdk11, etc.) are pubished. If any of these variants is missing, then the script rebuild+publish all the variants for this version.
Reason is that the "build and publish all variants for a given version" is only 1 line thanks to/because of the buildx's bake file.

Could the script not just check for the "main" image (tagged 2.357) and if that's missing, do all of the images? Or check each variant individually and only rebuild what's missing?

These are 2 valid scenario (if I understand correctly your proposals). The only tricky part is to find a way to only build 1 variant with docker buildx because it is a 1..N relationship (a given image variant can have multiple tags). It's doable at the cost of a bit of shell +jq of course.

The first proposal (e.g. only check for the latest version) sounds safer for me: it reads an environment variable giving it the version to be built.

  • By default, the variable is empty, so the script auto-checks the latest version on the repo.jenkins-ci.org maven metadata (already done dtoday).
  • If the variable is set, then it is a human explicit order to build a specific version. No question aske,no checks for publication: the script would simply build and deploy.

Needs to also consider that we sometimes publish two LTS lines at the same time.

Based on the proposal above with the env variable, when it happens, a human trigger a build and specify the list of versions as a parameter. The script should be able to parse it (using a separator such an endline) and iterate and that case.
What do you think: would that be ok?

@daniel-beck
Copy link
Member

Based on the proposal above with the env variable, when it happens, a human trigger a build and specify the list of versions as a parameter. The script should be able to parse it (using a separator such an endline) and iterate and that case.

Sure, more control is nice for security. Might be annoying for regular LTS operation though? Don't we want more automation rather than less?

@timja
Copy link
Member

timja commented Jul 5, 2022

There's somewhat of a plan to move the docker publishing to release.ci.jenkins.io and then trigger it as part of the release pipeline.

The release pipeline will have a parameter for the exact version which I think should solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants