Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra.ci.jenkins.io on arm64 (controller and agents) #3823

Closed
smerle33 opened this issue Nov 17, 2023 · 29 comments
Closed

infra.ci.jenkins.io on arm64 (controller and agents) #3823

smerle33 opened this issue Nov 17, 2023 · 29 comments

Comments

@smerle33
Copy link
Contributor

smerle33 commented Nov 17, 2023

Service(s)

Azure, infra.ci.jenkins.io

Summary

as for the publick8s, we should create an ARM64 nodepool within the kubernetes privateK8S in order to migrate jenkins agent to arm64.

Also need to migrate infra.ci.jenkins.io to arm (with care about PV/PVC migration to the correct zone for arm zone 1)

  • create an ARM64 nodepool
  • create ARM64 agent in this nodepool
  • migrate the infra.ci.jenkins.io instance in the arm64 nodepool

Reproduction steps

No response

@smerle33 smerle33 added the triage Incoming issues that need review label Nov 17, 2023
@smerle33 smerle33 changed the title PrivateK8S arm64 node pool infra.ci.jenkins.io arm64 node pool Nov 20, 2023
@dduportal dduportal added this to the infra-team-sync-2023-11-28 milestone Nov 21, 2023
@dduportal dduportal removed triage Incoming issues that need review weekly.ci.jenkins.io labels Nov 21, 2023
@dduportal dduportal changed the title infra.ci.jenkins.io arm64 node pool infra.ci.jenkins.io on arm64 (controller and agents) Nov 28, 2023
smerle33 added a commit to jenkins-infra/azure that referenced this issue Dec 4, 2023
as per jenkins-infra/helpdesk#3823

---------

Co-authored-by: Damien Duportal <damien.duportal@gmail.com>
Co-authored-by: Hervé Le Meur <91831478+lemeurherve@users.noreply.github.com>
@smerle33 smerle33 self-assigned this Dec 6, 2023
smerle33 added a commit to jenkins-infra/docker-helmfile that referenced this issue Dec 6, 2023
as per jenkins-infra/helpdesk#3823 and to start using arm64 agents on arm64 nodepool in infra (privatek8s) and before being able to use the ALLINONEVERSION
@smerle33
Copy link
Contributor Author

smerle33 commented Jan 2, 2024

@dduportal dduportal removed this from the infra-team-sync-2024-01-02 milestone Jan 3, 2024
@dduportal dduportal added this to the infra-team-sync-2024-04-09 milestone Apr 3, 2024
@smerle33
Copy link
Contributor Author

smerle33 commented Apr 4, 2024

Plan to migrate infra.ci.jenkins.io to arm64
See post-mortem

++++++++++++++++++++++++++++
Definition of done

  • the volume is a ZRS volume in the azure UI - Delete the temporarily and manually created disk volume and PVC (claim) also the azure snapshot
  • the pod start and is on an ARM64 node
  • the jenkins CI infra.ci.jenkins.io is responding

check to add :

  • check that the build queue is being handled

Post-Mortem:

we need to provide a PVC and PV matching the source disk :

  • create a snapshot (from UI)
  • create a disk from the snapshot (from UI)
  • create a PV with a yaml file with the above created disk :
apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: disk.csi.azure.com name: jenkins-infraci-snap spec: capacity: storage: 64Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: managed-csi-premium-zrs-retain csi: driver: disk.csi.azure.com volumeHandle: jenkins-infraci-snap volumeAttributes: fsType: ext4

note that the volumeHandle is shorten to the diskname not the full handle volumeHandle: jenkins-infraci-snap

  • create a PVC
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: jenkins-infraci-snap namespace: jenkins-infra spec: accessModes: - ReadWriteOnce resources: requests: storage: 64Gi volumeName: jenkins-infraci-snap storageClassName: managed-csi-premium-zrs-retain
  • kubectl to create them :
kubectl apply -f  .tmp/jenkins-infra-pv-snap.yaml
kubectl apply -f  .tmp/jenkins-infra-pvc-snap.yaml

as it was getting a timeout we had to add this to the jenkins helmchart values :

controller:
  podSecurityContextOverride:
    runAsUser: 1000
    runAsNonRoot: true
    supplementalGroups: [1000]

The Migration part cannot be done by merging the PR as pretented above : - [ ] Change the storage class with a datasource as per top of the file https://github.com/jenkins-infra/kubernetes-management/pull/4720/files (DO NOT MIGRATE TO ARM64 YET) - https://github.com/jenkins-infra/kubernetes-management/pull/5116 as the infra.ci is down so it need to be executed locally :

gh pr checkout 5116
helmfile -f "clusters/privatek8s.yaml" diff --suppress-secrets --skip-deps --context=2 --concurrency=8 -l name=jenkins-infra
helmfile -f "clusters/privatek8s.yaml" apply --suppress-secrets --skip-deps --context=2 --concurrency=8 -l name=jenkins-infra

The PR still need to be merged not to introduced changes when infra will start successfully

smerle33 added a commit to jenkins-infra/azure that referenced this issue Apr 5, 2024
as per
jenkins-infra/helpdesk#3823 (comment)
create a new storage class on private to be used for ZRS multizone
volumes, we need the volume to be accessible from both eastus2-1 for the
arm64 nodes and eastus2-3 for our intel/amd nodes

---------

Co-authored-by: Damien Duportal <damien.duportal@gmail.com>
smerle33 added a commit to jenkins-infra/azure that referenced this issue Apr 11, 2024
as per
jenkins-infra/helpdesk#3823 (comment)

---------

Co-authored-by: Damien Duportal <damien.duportal@gmail.com>
@smerle33
Copy link
Contributor Author

update: infra.ci is now officially running on arm64.

next steps: cleanup (next week)

dduportal added a commit to jenkins-infra/azure that referenced this issue Apr 12, 2024
…h agents (new subnet) (#665)

Related to jenkins-infra/helpdesk#3823

This PR follows up
jenkins-infra/kubernetes-management#5126

It fixes the failure to spin up VM agents since the `arm64` migration:
as the infra.ci.jenkins.io controller was moved to a new subnet in
#658

Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
@dduportal
Copy link
Contributor

Update: fixing a few errors discovered after the controller migration:

=> Build queue is now empty \o/

@smerle33
Copy link
Contributor Author

smerle33 commented Apr 15, 2024

(edited): issue tracking the release.ci's migration to arm64: #4042

@smerle33
Copy link
Contributor Author

smerle33 commented Apr 16, 2024

CleanUp (infra.ci, weekly.ci, release.ci)

  • remove PV/PVC from snap
  • remove disk not mounted (from snap)
  • remove snapshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants