Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate merge-blocking jobs to dedicated cluster: pull-kubernetes-dependencies #18846

Closed
spiffxp opened this issue Aug 14, 2020 · 7 comments
Closed
Assignees
Labels
area/jobs kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@spiffxp
Copy link
Member

spiffxp commented Aug 14, 2020

What should be cleaned up or changed:

This is part of #18550

To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you're not a member.

Migrate pull-kubernetes-dependencies to k8s-infra-prow-build by adding a cluster: k8s-infra-prow-build field to the job:

Once the PR has merged, note the date/time it merged. This will allow you to compare before/after behavior.

Things to watch for the job

Things to watch for the build cluster

  • prow-build dashboard 1w
    • is the build cluster scaling as needed? (e.g. maybe it can't scale because we've hit some kind of quota)
    • (it will probably be helpful to look at different time resolutions like 1h, 6h, 1d, 1w)
  • prowjobs-experiment 1w
    • (shows resource consumption of all job runs, pretty noisy but putting this here for completeness)

Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.

/wg k8s-infra
/sig testing
/area jobs
/help

@spiffxp spiffxp added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 14, 2020
@k8s-ci-robot k8s-ci-robot added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/jobs labels Aug 14, 2020
@helenfeng737
Copy link
Contributor

/assign

@helenfeng737
Copy link
Contributor

/remove-help

@k8s-ci-robot k8s-ci-robot removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 14, 2020
@spiffxp
Copy link
Member Author

spiffxp commented Aug 25, 2020

Checking in, how are things looking?

@helenfeng737
Copy link
Contributor

helenfeng737 commented Aug 26, 2020

does the job start failing more often?
does the job start going into error state?

No

does the job duration look worse than before? spikier than before?

No. Haven't noticed any irregular pattern.

do more failures show up than before?

No. Even fewer failures in the past 2 weeks.

is the job wildly underutilizing its CPU limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)

We give CPU limit as 2 and in the past 2 weeks the CPU usages are mostly below 1(see diagram below). Probably we can lower the limit.
image

metrics explorer: Memory limit utilization for pull-kubernetes-dependencies for 6h
is the job wildly underutilizing its memory limit? if so, perhaps tune down (if uncertain, post evidence in this issue and ask)

We give memory limit as 1.2G and so far the usage is between 0.6-1G. We can keep this number.

@helenfeng737
Copy link
Contributor

Checking in, how are things looking?

Please see the above.

@spiffxp
Copy link
Member Author

spiffxp commented Aug 28, 2020

/close
Thanks for the update, I think we can call those done!

I'm fine leaving the CPU where it's at, looking at shorter time ranges it hits ~100% utilization; the larger time windows use larger mean intervals than 1min.

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Thanks for the update, I think we can call those done!

I'm fine leaving the CPU where it's at, looking at shorter time ranges it hits ~100% utilization; the larger time windows use larger mean intervals than 1min.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

3 participants