Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci.jenkins.io][Infra-as-code] Define Core and plugins as code in a custom built Docker Image #3070

Open
dduportal opened this issue Jul 26, 2022 · 5 comments

Comments

@dduportal
Copy link
Contributor

dduportal commented Jul 26, 2022

Summary

This issue tracks discussion and tasks to allow executing ci.jenkins.io with a custom built Docker image to allow controlling:

  • The exact core version
  • The exhaustive list of plugin, with fixed versions

Why

There is no audit of which plugin was updated when on ci.jenkins.io. It's a concern for:

  • Security: no audit log, and a painful update process: plugins by plugins
  • Stabiliy: upgrading plugins cannot be tested. There are still cases that could not be tested on any staging/dev environment though, but right now we don't have anything at all for ci.jenkins.io: we have an outage on any "superficial" issue (cyclic dependency, accidental rollbacks).
  • Maintenability: manually managing a Jenkins instance is not sustainable for ci.jenkins.io: we are 10 to 20 (maybe more 😱 ) admins, no planning, not centralized notification so anyone can break the instance at any moment without the other being even aware (or worse: concurent plugins upgrade \o/).

What

=> Defining a custom built Docker Image, like we already do for the Jenkins controllers hosted in Kubernetes (https://github.com/jenkins-infra/docker-jenkins-lts and https://github.com/jenkins-infra/docker-jenkins-weekly) would solve all these issues.

However it would introduces the following challenge that need to be SOLVED with consensus before proceeding:

  • ⚠️ Sustainability over speed: Upgrading a plugin would take a bit more time with this method, because you have to, first, build test and release the new image, then bump the image tag (or wait for the automatic process to detect the new version and open a PR for us), and finally wait for the deployment to happen
    • This is more than acceptable for the infra team, given the problem it solves
    • To be discussed with the security team as it impacts their advisory process where speed is the essence. But since the image could be staged prior to the advisories publication, it could allow earlier testing.
      • The only technical item to be done by the infra team would be defining properly the "configuration self-healing" process: once the "private image is built, tested, staged, validated then deployed to ci.jenkins, then advisory finished", we need an update process that would publish all the sources changes (e.g. core version and/or plugin versions impacted by the advisory) and rebuild/test/deploy a public image. WDYT @Wadeck @daniel-beck ?
@Wadeck
Copy link

Wadeck commented Jul 26, 2022

The idea seems fine 👍
The problem I see if that it creates a new dependency for a security release. If your build is not passing for whatever reason, it will be more complicated than the current "manual update" process. Keeping that option as a fallback could be good.

we are 10 to 20 (maybe more 😱 ) admins

We will have to play a game together to reduce this kind of broad access to a crucial system ;)

@daniel-beck
Copy link

Note that due to https://github.com/jenkins-infra/jenkins.io/blob/3a83a37fcb6823232b70e06f481b8f95cd666885/scripts/fetch-external-resources#L36-L40 and https://github.com/jenkins-infra/jenkins.io/blob/3a83a37fcb6823232b70e06f481b8f95cd666885/scripts/fetch-external-resources#L59-L64, ci.j.io is currently an essential part of publishing changes like the advisory to jenkins.io.

if we decouple that by uploading these resources to Azure storage somewhere and downloading it during the site build from there, then we could publish the site while ci.j.io is unavailable. That would remove completion of ci.j.io maintenance from the critical path during advisory publication.

@dduportal
Copy link
Contributor Author

The problem I see if that it creates a new dependency for a security release. If your build is not passing for whatever reason, it will be more complicated than the current "manual update" process. Keeping that option as a fallback could be good.

Good warning, thanks for the feedbacks! The build process could fail for these reasons:

  • Network issue => there is no shame in adding the plugin binaries (that security team usually upload by hand to ci.jenkins.io) in the repository as a "fallback" solution for these cases (e.g. building the Docker image by hand). The plugins.txt support local file as well as direct download from any UC (default public but any private as well).
  • Failing tests => let's provide a way to disable tests during the advisory by default (for the current Kubernetes images, build is less than 1 min, tests are 1 to 2 minutes, deploy is ~15s).
  • Deployment issue (pushing the docker image) => fallback could be pushing manually with a personnal DockerHub account, or use a custom private Docker registry (ghcr.io o,n the jenkins-cert organization for instance)

WDYT?

Note that due to https://github.com/jenkins-infra/jenkins.io/blob/3a83a37fcb6823232b70e06f481b8f95cd666885/scripts/fetch-external-resources#L36-L40 and https://github.com/jenkins-infra/jenkins.io/blob/3a83a37fcb6823232b70e06f481b8f95cd666885/scripts/fetch-external-resources#L59-L64, ci.j.io is currently an essential part of publishing changes like the advisory to jenkins.io.

if we decouple that by uploading these resources to Azure storage somewhere and downloading it during the site build from there, then we could publish the site while ci.j.io is unavailable. That would remove completion of ci.j.io maintenance from the critical path during advisory publication.

Oh excellent point, thanks Daniel! Totally forgot about this element. I see it as a blocker for this issue, do you agree?

Side note: as soon as the Docker image for Core, and plugins can have staged releases, this issue would allows staging in advance the Docker image for ci.jenkins.io as well, which could be a great benefit!

@daniel-beck
Copy link

I see it as a blocker for this issue, do you agree?

Right, but I would expect this to be relatively straightforward to clean up, so yak shaving shouldn't take too long if we consider this task valuable (assuming we can put not particularly valuable credentials on ci.j.io to store this stuff elsewhere, or migrate these builds elsewhere, but which would come with less visibility).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants