From 9a20a8551250a875eac7a2961f505700cbab5dfa Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Sun, 5 Jun 2022 18:12:57 +0900 Subject: [PATCH 01/15] Add the production readiness requirements to graduate to beta --- .../README.md | 59 +++++++++++++++---- 1 file changed, 46 insertions(+), 13 deletions(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index b666427d117..23008e9dd7f 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -292,6 +292,13 @@ rollout. Similarly, consider large clusters and how enablement/disablement will rollout across nodes. --> +It shouldn't impact already running workloads. It's an opt-in feature, +and users need to set `pod.spec.topologySpreadConstraints.minDomains` field to use this feature. + +When this feature is disabled by the feature flag, the already created Pod's `pod.spec.topologySpreadConstraints.minDomains` field is preserved, +but, the newly created Pod's `pod.spec.topologySpreadConstraints.minDomains` field is silently dropped. + + ###### What specific metrics should inform a rollback? +A spike on metric `schedule_attempts_total{result="error|unschedulable"}` when pods using this feature are added. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? +Yes. The behavior is changed as expected. + ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? +No. + ### Monitoring Requirements +The operator can query pods with `pod.spec.topologySpreadConstraints.minDomains` field set. + ###### How can someone using this feature know that it is working for their instance? -- [ ] Events - - Event Reason: -- [ ] API .status - - Condition name: - - Other field: -- [ ] Other (treat as last resort) - - Details: +- [x] Other (treat as last resort) + - Details: + The feature MinDomains in Pod Topology Sprad plugin doesn't cause any logs, any events, any pod status updates. + If a Pod using `pod.spec.topologySpreadConstraints.minDomains` was successfully assigned a Node, + nodeName will be updated. + And if not, `PodScheduled` condition will be false and an event will be recorded with a detailed message + describing the reason including the failed filters. (Pod Topology Spread plugin could be one of them.) ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? @@ -363,18 +378,19 @@ These goals will help you determine what you need to measure (SLIs) in the next question. --> +- 99% of `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` are within x milliseconds. +- x% of `schedule_attempts_total` are successful. + ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: -- [ ] Other (treat as last resort) - - Details: + - [x] Metrics + - Component exposing the metric: kube-scheduler + - Metric name: `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` + - Metric name: `schedule_attempts_total{result="error|unschedulable"}` ###### Are there any missing metrics that would be useful to have to improve observability of this feature? @@ -383,6 +399,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co implementation difficulties, etc.). --> +No. + ### Dependencies +No. + ### Scalability ###### Will enabling / using this feature result in any new API calls? @@ -466,8 +486,13 @@ details). For now, we leave it here. ###### How does this feature react if the API server and/or etcd is unavailable? +The feature doesn't affected because Pod Topology Spread plugin doesn't communicate with kube-apiserver or etcd +during Filter phase. + ###### What are other known failure modes? +N/A + + - 2021-11-02: Initial KEP sent for review + - 2022-01-14: Initial KEP is merged. + - 2022-03-16: The implementation PRs are merged. + - 2022-05-03: The MinDomain feature is released as alpha feature with Kubernetes v1.24 release. + ## Drawbacks -A spike on metric `schedule_attempts_total{result="error|unschedulable"}` when pods using this feature are added. +- A spike on metric `schedule_attempts_total{result="error|unschedulable"}` when pods using this feature are added. +- A spike on metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` or `scheduling_algorithm_duration_seconds` when pods using this feature are added. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? From 74166ed047892e5781e750917bb160874824cfcd Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Tue, 14 Jun 2022 22:01:16 +0900 Subject: [PATCH 05/15] set actual value for SLOs --- .../3022-min-domains-in-pod-topology-spread/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 09f3ac8cae4..ee12946da82 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -393,8 +393,7 @@ These goals will help you determine what you need to measure (SLIs) in the next question. --> -- 99% of `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` are within x milliseconds. -- x% of `schedule_attempts_total` are successful. +- Metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` <= 100ms on 90-percentile. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? From 266197d0c4d9280ae2c6395bbdb608deda8739e9 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Tue, 14 Jun 2022 22:02:03 +0900 Subject: [PATCH 06/15] add mention for useful metrics that allows us to see more detail scheduling result --- .../3022-min-domains-in-pod-topology-spread/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index ee12946da82..17c96c39620 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -413,7 +413,7 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co implementation difficulties, etc.). --> -No. +Yes. It would be useful if we could see more details related to scheduler's decisions in metrics. ### Dependencies From 817fdb27a296ac80df5564c384fd93253a8cbbe2 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Tue, 14 Jun 2022 22:02:54 +0900 Subject: [PATCH 07/15] add step to solve --- .../3022-min-domains-in-pod-topology-spread/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 17c96c39620..dfc006e231f 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -523,7 +523,10 @@ For each of them, fill in the following information by copying the below templat ###### What steps should be taken if SLOs are not being met to determine the problem? - Check `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` to see if latency increased. + - In this case, the metrics showes literally the feature is slow. + - You should stop using `MinDomains` in your Pods and may need to disable `MinDomains` feature by feature flag `MinDomainsInPodTopologySpread`. - Check `schedule_attempts_total{result="error|unschedulable"}` to see if the number of attempts increased. + - In this case, your use of `MinDomains` may be incorrect or not appropriate for your cluster. ## Implementation History From 603325a4917244cf8a8a1d1d7e28037f9b6b9111 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Tue, 14 Jun 2022 22:03:01 +0900 Subject: [PATCH 08/15] fix typo --- .../3022-min-domains-in-pod-topology-spread/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index dfc006e231f..10e8e2514c2 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -500,7 +500,7 @@ details). For now, we leave it here. ###### How does this feature react if the API server and/or etcd is unavailable? -The feature doesn't affected because Pod Topology Spread plugin doesn't communicate with kube-apiserver or etcd +The feature isn't affected because Pod Topology Spread plugin doesn't communicate with kube-apiserver or etcd during Filter phase. ###### What are other known failure modes? From 1a9786cb58f48416fb90998193299e0fd59ae804 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Fri, 17 Jun 2022 23:11:07 +0900 Subject: [PATCH 09/15] address review --- .../README.md | 59 ++++++++++++++++++- .../kep.yaml | 1 - 2 files changed, 56 insertions(+), 4 deletions(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 10e8e2514c2..ed2e18ed3e2 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -198,6 +198,13 @@ With the flow, this deployment will be spread over at least 5 Nodes while protec ### Test Plan + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + To ensure this feature to be rolled out in high quality. Following tests are mandatory: - **Unit Tests**: All core changes must be covered by unit tests. @@ -207,6 +214,52 @@ To ensure this feature to be rolled out in high quality. Following tests are man using this feature, but it shouldn't impose penalty to users who are not using this feature. We will verify it by designing some benchmark tests. +##### Unit tests + + + + + +- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread`: `2020-06-17` - `86%` + +##### Integration tests + + + +test: https://github.com/kubernetes/kubernetes/blob/19c8a2127177b43effb9bfe1de272d6f08ea56e8/test/integration/scheduler/filters/filters_test.go#L1060 +k8s-triage: https://storage.googleapis.com/k8s-triage/index.html?sig=scheduling&test=TestPodTopologySpreadFilter + +##### e2e tests + + + +N/A + ### Graduation Criteria #### Alpha (v1.24): @@ -320,16 +373,16 @@ are missing a bunch of machinery and tooling and can't do that now. Yes. The behavior is changed as expected. Test scenario: -1. start kube-apiserver that `MinDomains` feature is enabled. +1. start kube-apiserver where `MinDomains` feature is enabled. 2. create three nodes and pods spread across nodes as 2/2/1 3. create new Pod that has a TopologySpreadConstraints: maxSkew is 1, topologyKey is `kubernetes.io/hostname`, and minDomains is 4 (larger than the number of domains (= 3)). 4. the Pod created in (3) isn't scheduled because of `MinDomain`. 5. delete the Pod created in (3). -6. recraete kube-apiserver that `MinDomains` feature is disabled. +6. recreate kube-apiserver where `MinDomains` feature is disabled. 7. create the same Pod as (3). 8. the Pod created in (7) is scheduled because `MinDomain` is disabled. 9. delete the Pod created in (7). -10. recreate kube-apiserver that `MinDomains` feature is enabled. +10. recreate kube-apiserver where `MinDomains` feature is enabled. 11. create the same Pod as (3). 12. the Pod created in (11) isn't scheduled because of `MinDomain`. diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/kep.yaml b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/kep.yaml index 44e5afa0520..93e557c597c 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/kep.yaml +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/kep.yaml @@ -19,7 +19,6 @@ latest-milestone: "v1.25" milestone: alpha: "v1.24" beta: "v1.25" - stable: "v1.26" disable-supported: true feature-gates: - name: MinDomainsInPodTopologySpread From ede06eb51efeca0636fbe0d14dfd050e621b622c Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Fri, 17 Jun 2022 23:14:56 +0900 Subject: [PATCH 10/15] update toc --- .../3022-min-domains-in-pod-topology-spread/README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index ed2e18ed3e2..f21f34b3d13 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -1,7 +1,6 @@ # KEP-3022: min domains in Pod Topology Spread - - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) @@ -14,6 +13,10 @@ - [Implementation details](#implementation-details) - [How user stories are addressed](#how-user-stories-are-addressed) - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) - [Alpha (v1.24):](#alpha-v124) - [Beta (v1.25):](#beta-v125) From a56d6cfba018ae2d3533e3a274e6312660f6591b Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Sat, 18 Jun 2022 01:04:47 +0900 Subject: [PATCH 11/15] write more specific about metrics --- .../3022-min-domains-in-pod-topology-spread/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index f21f34b3d13..547c9537e18 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -469,7 +469,9 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co implementation difficulties, etc.). --> -Yes. It would be useful if we could see more details related to scheduler's decisions in metrics. +Yes. It would be useful if we could see which filter plugin affected Pod's scheduling results in metrics. + +Issue: https://github.com/kubernetes/kubernetes/issues/110643 ### Dependencies From 534356416dd3f6b94e9a8f9b2fb45b9993ae5596 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Sat, 18 Jun 2022 01:05:29 +0900 Subject: [PATCH 12/15] write why we don't need e2e --- .../3022-min-domains-in-pod-topology-spread/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 547c9537e18..15e2254cc12 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -263,6 +263,11 @@ We expect no non-infra related flakes in the last month as a GA graduation crite N/A +-- + +This feature doesn't introduce any new API endpoints and doesn't interact with other components. +So, E2E tests doesn't add extra value to integration tests. + ### Graduation Criteria #### Alpha (v1.24): From a0f407206c33d09684aaab958cff1ad697b1d3a2 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Sat, 18 Jun 2022 01:05:56 +0900 Subject: [PATCH 13/15] add another packages tests --- .../3022-min-domains-in-pod-topology-spread/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 15e2254cc12..8942e4b1458 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -237,7 +237,9 @@ This can inform certain test coverage improvements that we want to do before extending the production code to implement this enhancement. --> -- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread`: `2020-06-17` - `86%` +- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread`: `2022-06-17` - `86%` +- `k8s.io/kubernetes/pkg/api/pod`: `2022-06-17` - `66.7%` +- `k8s.io/kubernetes/pkg/apis/core/validation`: `2022-06-17` - `82.1%` ##### Integration tests From 763ae6db6dd30d59f964d1a0ce0742202251d1a7 Mon Sep 17 00:00:00 2001 From: sanposhiho <44139130+sanposhiho@users.noreply.github.com> Date: Sat, 18 Jun 2022 03:45:57 +0900 Subject: [PATCH 14/15] Update the manual test for update/rollback --- .../README.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md index 8942e4b1458..40240667679 100644 --- a/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md +++ b/keps/sig-scheduling/3022-min-domains-in-pod-topology-spread/README.md @@ -383,18 +383,23 @@ are missing a bunch of machinery and tooling and can't do that now. Yes. The behavior is changed as expected. Test scenario: -1. start kube-apiserver where `MinDomains` feature is enabled. +1. start kube-apiserver v1.24 where `MinDomains` feature is disabled. 2. create three nodes and pods spread across nodes as 2/2/1 3. create new Pod that has a TopologySpreadConstraints: maxSkew is 1, topologyKey is `kubernetes.io/hostname`, and minDomains is 4 (larger than the number of domains (= 3)). -4. the Pod created in (3) isn't scheduled because of `MinDomain`. +4. the Pod created in (3) is scheduled because `MinDomain` is disabled. 5. delete the Pod created in (3). -6. recreate kube-apiserver where `MinDomains` feature is disabled. +6. recreate kube-apiserver v1.25 where `MinDomains` feature is enabled. 7. create the same Pod as (3). -8. the Pod created in (7) is scheduled because `MinDomain` is disabled. +8. the Pod created in (7) isn't scheduled because `MinDomain` is enabled and minDomains is larger than the number of domains (= 3)). 9. delete the Pod created in (7). -10. recreate kube-apiserver where `MinDomains` feature is enabled. +10. recreate kube-apiserver v1.24 where `MinDomains` feature is disabled. 11. create the same Pod as (3). -12. the Pod created in (11) isn't scheduled because of `MinDomain`. +12. the Pod created in (11) is scheduled because `MinDomain` is disabled. +13. delete the Pod created in (11). +14. recreate kube-apiserver v1.25 where `MinDomains` feature is enabled. +15. create the same Pod as (3). +16. the Pod created in (15) isn't scheduled because `MinDomain` is enabled and minDomains is larger than the number of domains (= 3)). +17. delete the Pod created in (15). ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? From 2c50304b3a2d72a47385f6d0957026f5b96a098b Mon Sep 17 00:00:00 2001 From: Kensei Nakada Date: Tue, 21 Jun 2022 20:51:20 +0900 Subject: [PATCH 15/15] Change PRR approver to johnbelamaric --- keps/prod-readiness/sig-scheduling/3022.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/prod-readiness/sig-scheduling/3022.yaml b/keps/prod-readiness/sig-scheduling/3022.yaml index 270915b245a..5fb9316f93c 100644 --- a/keps/prod-readiness/sig-scheduling/3022.yaml +++ b/keps/prod-readiness/sig-scheduling/3022.yaml @@ -5,4 +5,4 @@ kep-number: 3022 alpha: approver: "@wojtek-t" beta: - approver: "@wojtek-t" \ No newline at end of file + approver: "@johnbelamaric" \ No newline at end of file