Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella Issue] Migrate prow jobs to community clusters #29722

Closed
54 tasks done
rjsadow opened this issue Jun 7, 2023 · 32 comments
Closed
54 tasks done

[Umbrella Issue] Migrate prow jobs to community clusters #29722

rjsadow opened this issue Jun 7, 2023 · 32 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@rjsadow
Copy link
Contributor

rjsadow commented Jun 7, 2023

In an ongoing effort to migrate to community-owned resources, SIG K8S Infra and SIG Testing are working to complete the migration of jobs from the Google-owned internal GKE cluster to community-owned clusters.

All jobs in the Prow Default Cluster that do not depend on external cloud assets should attempt migrate to cluster: eks-prow-build-cluster.

What needs to be done?

To get started please see eks-jobs-migration for details.

Fork and check out the kubernetes/test-infra repository, then follow the steps below:

  1. Find the name of the job you wish to check has a cluster specified or not say pull-jobset-test-integration-main from the "Prow Results" link below.
  2. Edit the file that pull-jobset-test-integration-main is defined in from the "Search Results" link, in the job definition look for a cluster: key, if there isn't one then the job runs in the default cluster, So add one cluster: eks-prow-build-cluster. NOTE: if you see any entries under label that says gce skip this job and go to the next time as this is not ready to be moved yet.
  3. Save the file, commit the change, create a branch and file a PR
  4. Having trouble? Leave a note here in this issue and/or come to #sig-k8s-infra or #sig-testing slack channel to ask for help

NOTE: The Google-owned clusters did not require any resource definitions whereas the community-owned clusters do. If your merge is failing the pull-test-infra-unit-test job, please add CPU/Memory requests/limits. Work with the appropriate sig owners to determine the necessary capacity for each job.

Below is a list of repos that currently have jobs in the default cluster.

Repos with default cluster jobs found

@rjsadow rjsadow added the kind/bug Categorizes issue or PR as related to a bug. label Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 7, 2023
@rjsadow
Copy link
Contributor Author

rjsadow commented Jun 7, 2023

/sig k8s-infra
/kind cleanup

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 7, 2023
@ameukam
Copy link
Member

ameukam commented Jun 8, 2023

For kops, we have a separate issue: kubernetes/k8s.io#5127
I think for the moment it's better to focus on the presubmits.

@rayandas
Copy link
Member

rayandas commented Jun 8, 2023

@dims Can I pick up the cluster-api jobs? I will migrate as much as I can.

@ameukam
Copy link
Member

ameukam commented Jun 8, 2023

@dims Can I pick up the cluster-api jobs? I will migrate as much as I can.

@rayandas @rjsadow There is an ongoing effort about cluster-api. Please coordinate with cluster-api maintainers

@ArkaSaha30
Copy link
Member

Hello @dims, I am familiar with kubernetes/sig-release jobs so is it okay to work on it?
cc @ameukam

@rjsadow
Copy link
Contributor Author

rjsadow commented Jun 8, 2023

@dims @ameukam I think the pull-test-infra-unit-test job requiring resoure limits/requests for the community clusters is going to be a stumbling block for a lot of these migrations. Is there any existing documentation or resources that we can empower contributers to reference to for setting initial values then iterating? Or should we expect the sig maintainters and leads to help provide input on a job-by-job basis?

Example in #29724 where I WAG'd .5CPU and 2GB and asked the cluster lifecycle leads to review.

@ShivamTyagi12345
Copy link
Member

I would like to learn more about this , so I will pick one kubernetes-sigs/kustomize | Search Results |

@furkatgofurov7
Copy link
Member

What needs to be done?

Fork and check out the kubernetes/test-infra repository, then follow the steps below:

  1. Find the name of the job you wish to check has a cluster specified or not say pull-jobset-test-integration-main from the "Search Results" link above

  2. Find the yaml file in kubernetes/test-infra using our search:

  3. Edit the file in step 2, in the job definition look for a cluster: key, if there isn't one then the job runs in the default cluster, So add one cluster: eks-prow-build-cluster. NOTE: if you see any entries under label that says gce skip this job and go to the next time as this is not ready to be moved yet.

  4. Save the file, commit the change, create a branch and file a PR

  5. Having trouble? Leave a note here in this issue and/or come to #sig-k8s-infra or #sig-testing slack channel to ask for help

@dims hi, thanks for sharing the steps. Also I noticed PR's moving to new clusters with just specifying the new cluster name, there are others specifying resource quotas (CPU/memory) while migrating. Is the latter requirement or generally maybe recommended while just specifying the name is also enough?

@rjsadow
Copy link
Contributor Author

rjsadow commented Jun 9, 2023

Is the latter requirement or generally maybe recommended while just specifying the name is also enough?

The Google-owned clusters did not require any resource definitions whereas the community-owned clusters do. If a job already has resource quotas (both requests and limits) then just the name is enough. If the job is missing any resource quotas then those will need to be added or else the pull-test-infra-unit-test check will fail.

@Vyom-Yadav
Copy link
Member

I will be working on kubernetes/node-problem-detector | Prow Results | Search Results

ameukam added a commit to ameukam/test-infra that referenced this issue Jan 12, 2024
Part of:
  -  kubernetes#29722

Move a few jobs still running on Google Infra.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/test-infra that referenced this issue Jan 12, 2024
Part of:
  -  kubernetes#29722

Move a few jobs still running on Google Infra.

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 21, 2024
@BenTheElder
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 25, 2024
@kubernetes kubernetes deleted a comment from k8s-triage-robot Mar 25, 2024
@BenTheElder
Copy link
Member

xref: #32432

@BenTheElder
Copy link
Member

We're also keeping this list up to date: https://github.com/kubernetes/test-infra/blob/master/docs/job-migration-todo.md

@BenTheElder
Copy link
Member

BenTheElder commented Jun 26, 2024

filed a few warnings:
kubernetes/kops#16637
kubernetes-sigs/node-feature-discovery#1747
kubernetes/node-problem-detector#920

+ messages in slack, in addition to the kubernetes-dev emails

@BenTheElder
Copy link
Member

NPD is actually done now. We're getting really close. We have DO spending info now and sorted out credentials, @upodroid has been migrating those jobs. I migrated the slack infra and some more of the test-infra trusted jobs. kops jobs are being migrated with DO as well.

@BenTheElder
Copy link
Member

secrets-store-csi-driver seems like a new wrinkle https://kubernetes.slack.com/archives/C09QZ4DQB/p1722024643376909

we know that vsphere jobs are blocked on resource availability and may not making it kubernetes/k8s.io#6877

azure jobs are in flight and making steady progress now

most other jobs are done, leaving a handful of trusted project automation jobs in #32432 , and some scalability related jobs

we may have missed others and should do another pass through these to identify the categories

@BenTheElder
Copy link
Member

BenTheElder commented Jul 26, 2024

#33127 makes it a little more obvious what grouping remaining jobs are in based on the secrets

Since filed a couple more tracking issues ...

In addition to the vsphere + azure e2e jobs we also have:

@BenTheElder
Copy link
Member

Closing this in favor of #33226 and #32432

Announcement at https://groups.google.com/a/kubernetes.io/g/dev/c/qzNYpcN5la4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
None yet
Development

Successfully merging a pull request may close this issue.