From baeea828f3eae53f62a3a5c049a1c7a3291b77b2 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Thu, 26 Sep 2019 17:09:43 -0400 Subject: [PATCH 1/8] Add KEP for default Even Pods Spreading Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 180 ++++++++++++++++++ 1 file changed, 180 insertions(+) create mode 100644 keps/sig-scheduling/20190926-default-even-pod-spreading.md diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md new file mode 100644 index 00000000000..21ecce43698 --- /dev/null +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -0,0 +1,180 @@ +--- +title: Default Even Pod Spreading +authors: + - "@alculquicondor" +owning-sig: sig-scheduling +reviewers: + - "@ahg-g" + - "@Huang-Wei" +approvers: + - "@ahg-g" + - "@k82cn" +creation-date: 2019-09-26 +last-updated: 2010-09-26 +status: provisional +see-also: + - "/keps/sig-aaa/20190221-even-pods-spreading.md" +--- + +# Default Even Pod Spreading + +## Table of Contents + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Implementation Details](#implementation-details) + - [In the metadata/predicates/priorities flow](#in-the-metadatapredicatespriorities-flow) + - [In the scheduler framework](#in-the-scheduler-framework) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) +- [Implementation History](#implementation-history) +- [Alternatives [optional]](#alternatives-optional) + + +## Release Signoff Checklist + +- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR) +- [ ] KEP approvers have set the KEP status to `implementable` +- [ ] Design details are appropriately documented +- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input +- [ ] Graduation criteria is in place +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +## Summary + +With `EvenPodsSpreading`, workload authors can define spreading rules for their loads based on the +topology of the clusters. + +By introducing configurable default spreading rules, workloads can be spread in the topology of +the cluster according to opinionated rules set by a cluster operator, without having to explicitly +define them. And workloads with specific needs can still override the rules by defining them in +their PodSpec. + +## Motivation + +In order for a workload (pod) to use `EvenPodsSpreading`: + +1. Authors have to have an idea of the underlying topology. +1. PodSpecs become less portable if their spreading rules are tailored to a specific topology. + +On the other hand, cluster operators know the underlying topology of the cluster, which makes +them suitable to provide default spreading rules for all workloads in their cluster. + +### Goals + +- Cluster operators can define default spreading rules for pods that don't provide any. +- Workloads are spread with the default rules if they belong to the same service, controller, +replica set or stateful set, +and if they don't define `TopologySpreadConstraints`. + +### Non-Goals + +- Removal of `SelectorSpreadPriority`, `ServiceSpreadingPriority` or `ServiceAntiAffinity` priorities. + +## Proposal + +### User Stories + +#### Story 1 + +As a cluster operator, I want to set default spreading rules for workloads in the cluster. +Currently, `SelectorSpreadPriority` provides a canned priority that spreads across nodes +and zones (`failure-domain.beta.kubernetes.io/zone`). However, the nodes in my cluster have +custom topology keys (for physical host, rack, etc.). + +#### Story 2 + +As a workload author, I want to spread the workload in the cluster, but: +(1) I don't know the topology of the cluster I'm running on. +(2) I want to be able to run my PodSpec in different clusters (on-prem and cloud). + +### Implementation Details/Notes/Constraints + +Note that a priority given by default `EvenPodsSpreading` rules could conflict with +`SelectorSpreadingPriority`. +Operators can disable `SelectorSpreadingPriority`. But once default rules for `EvenPodsSpreading` is GA, +we can consider removing `SelectorSpreadingPriority` and replacing it by an equivalent +k8s default to the default rules for `EvenPodsSpreading`. + +### Risks and Mitigations + +`EvenPodsSpreading` has some overhead and we currently ensure that pods that don't use the +feature get minimally affected. After default rules for `EvenPodsSpreading` is rolled out, +all pods will run through the algorithms. +However, we should ensure that the running overhead is not significantly higher than +`SelectorSpreadingPriority` if using the k8s default. + +## Design Details + +### API + +A new structure called `TopologySpreadConstraint` is introduced to `KubeSchedulerConfiguration`. + +```go +type KubeSchedulerConfiguration struct { + .... + // DefaultTopologySpreadConstraints defines spreading constraints to be applied to pods + // that don't define any. + // If not specified, the scheduler applies the following default. + // +optional + DefaultTopologySpreadConstraints []TopologySpreadConstraint + .... +} + +// TopologySpreadConstraint specifies how to spread pods among the given topology. +// Pod selectors are deduced from the resource definitions that the pod belongs to +// (includes services, controllers, replica sets and stateful sets). +type TopologySpreadConstraint struct { + MaxSkew int32 + TopologyKey string + WhenUnsatisfiable corev1.UnsatisfiableConstraintAction +} +``` + +Note that `TopologySpreadConstraint` is similar to `k8s.io/api/core/v1.TopologySpreadConstraint`, +except that it doesn't define selectors. + +### Implementation Details + +#### In the metadata/predicates/priorities flow + +_To be filled after agreement on the API_ + +#### In the scheduler framework + +_To be filled after agreement on the API_ + +### Test Plan + +_To be filled after agreement on the API_ + +### Graduation Criteria + +_To be filled after agreement on the API_ + +## Implementation History + +- 2019-09-26: Initial KEP sent out for review. + +## Alternatives [optional] + +- Make the topology keys used in `SelectorSpreadingPriority` configurable. + + While this moves the scheduler in the right direction, there are two problems: + + 1. We can only support one topology key. + 1. It makes it hard for pods to override the operator-provided spreading rules. \ No newline at end of file From cc1cb5c702f9098fecbcca2b196e367a76f00750 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Fri, 27 Sep 2019 13:41:45 -0400 Subject: [PATCH 2/8] Add defaults, example and test plan Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 89 ++++++++++++++++++- 1 file changed, 88 insertions(+), 1 deletion(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index 21ecce43698..6d9f919f84c 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -34,6 +34,8 @@ see-also: - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [API](#api) + - [Default rules](#default-rules) + - [How user stories are addressed](#how-user-stories-are-addressed) - [Implementation Details](#implementation-details) - [In the metadata/predicates/priorities flow](#in-the-metadatapredicatespriorities-flow) - [In the scheduler framework](#in-the-scheduler-framework) @@ -148,6 +150,86 @@ type TopologySpreadConstraint struct { Note that `TopologySpreadConstraint` is similar to `k8s.io/api/core/v1.TopologySpreadConstraint`, except that it doesn't define selectors. +### Default rules + +These will be the default rules for the cluster when the operator doesn't provide any: + +```yaml +defaultTopologySpreadConstraints: + - + maxSkew: 1 + topologyKey: "kubernetes.io/hostname" + whenUnsatisfiable: ScheduleAnyway + - + maxSkew: 1 + topologyKey: "failure-domain.beta.kubernetes.io/zone" + whenUnsatisfiable: ScheduleAnyway +``` + +### How user stories are addressed + +Let's say we have a cluster that has a topology based on physical hosts and racks. +Then, an operator can set the following scheduler configuration: + +```yaml +apiVersion: componentconfig/v1alpha1 +defaultTopologySpreadConstraints: + - + maxSkew: 5 + topologyKey: "example.com/topology/physical_host" + whenUnsatisfiable: ScheduleAnyway + - + maxSkew: 15 + topologyKey: "example.com/topology/rack" + whenUnsatisfiable: DoNotSchedule +``` + +Then, a workload author could have the following `ReplicaSet`: + +```yaml +apiVersion: apps/v1 +kind: ReplicaSet +metadata: + name: replicated_demo +spec: + replicas: 3 + selector: + matchLabels: + app: demo + template: + metadata: + labels: + app: demo + spec: + containers: + - name: php-redis + image: example.com/registry/demo:latest +``` + +Note that the workload author didn't provide a spreading rules. +The following rules will be applied to the pods of this replica set before running the +algorithms for Even Pods Spreading: + +```yaml +topologySpreadConstraints: + - + maxSkew: 5 + TopologyKey: "example.com/topology/physical_host" + WhenUnsatisfiable: ScheduleAnyway + selector: + matchLabels: + app: demo + - + maxSkew: 15 + TopologyKey: "example.com/topology/rack" + WhenUnsatisfiable: DoNotSchedule + selector: + matchLabels: + app: demo +``` + +These rules are internal to the scheduler and they don't get reflected in the apiserver. + ### Implementation Details #### In the metadata/predicates/priorities flow @@ -160,7 +242,12 @@ _To be filled after agreement on the API_ ### Test Plan -_To be filled after agreement on the API_ +To ensure this feature to be rolled out in high quality. Following tests are mandatory: + +- **Unit Tests**: All core changes must be covered by unit tests. +- **Integration Tests**: One integration test for the default rules and one for custom rules. +- **Benchmark Tests**: A benchmark test that compare the default rules against `SelectorSpreadingPriority`. + The performance should be as close as possible. ### Graduation Criteria From c4ec833b5af626cf8c997d732e2ad34f3ca8bda5 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Fri, 27 Sep 2019 16:35:21 -0400 Subject: [PATCH 3/8] Add implemention details and graduation criteria. Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index 6d9f919f84c..a55fc5f9d7c 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -234,11 +234,16 @@ These rules are internal to the scheduler and they don't get reflected in the ap #### In the metadata/predicates/priorities flow -_To be filled after agreement on the API_ +1. Calculate the spreading constraints for the pod as part of the metadata calculation. + Use the constraints provided by the pod or calculate the default ones if they don't provide any. +1. When running the predicates or priorities, use the constraints stored in the metadata. #### In the scheduler framework -_To be filled after agreement on the API_ +1. Calculate spreading constraints for the pod in the `PreFilter` extension point. Store them + in the `PluginContext`. +1. In the `Filter` and `Score` extension points, use the stored spreading constraints instead of + the ones defined by the pod. ### Test Plan @@ -251,7 +256,13 @@ To ensure this feature to be rolled out in high quality. Following tests are man ### Graduation Criteria -_To be filled after agreement on the API_ +Alpha (v1.17): + +[ ] Scheduler Component Config API changes. +[ ] Default, validation and generated code. +[ ] Priority Implementation. +[ ] Predicate implementation. +[ ] Test cases mentioned in the [Test Plan](#test-plan). ## Implementation History From 30827b1e543640907c6eb4e1a34fbcf00c453219 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Mon, 30 Sep 2019 12:07:45 -0400 Subject: [PATCH 4/8] Rewording and YAML formatting Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 79 +++++++++---------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index a55fc5f9d7c..c34416b7297 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -58,17 +58,19 @@ see-also: ## Summary -With `EvenPodsSpreading`, workload authors can define spreading rules for their loads based on the -topology of the clusters. +With [Even Pods Spreading](/keps/sig-scheduling/20190221-even-pods-spreading.md), +workload authors can define spreading rules for their loads based on the topology of the clusters. +The spreading rules are defined in the `PodSpec`, thus they are tied to the pod. -By introducing configurable default spreading rules, workloads can be spread in the topology of -the cluster according to opinionated rules set by a cluster operator, without having to explicitly -define them. And workloads with specific needs can still override the rules by defining them in -their PodSpec. +We propose the introduction of configurable default spreading rules, i.e. rules that +can be defined at the cluster level and are applied to pods that don't explicitly define spreading rules. +This way, all pods can be spread according to (likely better informed) rules set by a cluster operator. +Workload authors don't need to know the topology of the cluster they will be running on to have their pods spread. +But if they do, they can still set their own spreading rules if they have specific needs. ## Motivation -In order for a workload (pod) to use `EvenPodsSpreading`: +In order for a workload (pod) to use `EvenPodsSpreadPriority`: 1. Authors have to have an idea of the underlying topology. 1. PodSpecs become less portable if their spreading rules are tailored to a specific topology. @@ -78,10 +80,10 @@ them suitable to provide default spreading rules for all workloads in their clus ### Goals -- Cluster operators can define default spreading rules for pods that don't provide any. -- Workloads are spread with the default rules if they belong to the same service, controller, -replica set or stateful set, -and if they don't define `TopologySpreadConstraints`. +- Cluster operators can define default spreading rules for pods that don't provide any + `pod.spec.topologySpreadConstraints`. +- Workloads are spread with the default rules if they belong to the same service, replication controller, + replica set or stateful set, and if they don't define `pod.spec.topologySpreadConstraints`. ### Non-Goals @@ -106,16 +108,17 @@ As a workload author, I want to spread the workload in the cluster, but: ### Implementation Details/Notes/Constraints -Note that a priority given by default `EvenPodsSpreading` rules could conflict with -`SelectorSpreadingPriority`. -Operators can disable `SelectorSpreadingPriority`. But once default rules for `EvenPodsSpreading` is GA, -we can consider removing `SelectorSpreadingPriority` and replacing it by an equivalent -k8s default to the default rules for `EvenPodsSpreading`. +Note that a priority given by Default `topologySpreadConstraints` could conflict with the priority +given by `SelectorSpreadingPriority`. + +Operators can disable `SelectorSpreadingPriority` or give more precedence to `EvenPodsSpreadPriority`. +But once this KEP graduates to GA, there should be no need for `SelectorSpreadingPriority` +and it can be replaced by an equivalent k8s default to Default `topologySpreadConstraints`. ### Risks and Mitigations -`EvenPodsSpreading` has some overhead and we currently ensure that pods that don't use the -feature get minimally affected. After default rules for `EvenPodsSpreading` is rolled out, +`EvenPodsSpreadPriority` has some overhead and we currently ensure that pods that don't use the +feature get minimally affected. After Default `topologySpreadConstraints` is rolled out, all pods will run through the algorithms. However, we should ensure that the running overhead is not significantly higher than `SelectorSpreadingPriority` if using the k8s default. @@ -131,7 +134,8 @@ type KubeSchedulerConfiguration struct { .... // DefaultTopologySpreadConstraints defines spreading constraints to be applied to pods // that don't define any. - // If not specified, the scheduler applies the following default. + // If not specified, the scheduler applies the following default: + // // +optional DefaultTopologySpreadConstraints []TopologySpreadConstraint .... @@ -156,12 +160,10 @@ These will be the default rules for the cluster when the operator doesn't provid ```yaml defaultTopologySpreadConstraints: - - - maxSkew: 1 + - maxSkew: 1 topologyKey: "kubernetes.io/hostname" whenUnsatisfiable: ScheduleAnyway - - - maxSkew: 1 + - maxSkew: 1 topologyKey: "failure-domain.beta.kubernetes.io/zone" whenUnsatisfiable: ScheduleAnyway ``` @@ -174,12 +176,10 @@ Then, an operator can set the following scheduler configuration: ```yaml apiVersion: componentconfig/v1alpha1 defaultTopologySpreadConstraints: - - - maxSkew: 5 + - maxSkew: 5 topologyKey: "example.com/topology/physical_host" whenUnsatisfiable: ScheduleAnyway - - - maxSkew: 15 + - maxSkew: 15 topologyKey: "example.com/topology/rack" whenUnsatisfiable: DoNotSchedule ``` @@ -206,21 +206,19 @@ spec: image: example.com/registry/demo:latest ``` -Note that the workload author didn't provide a spreading rules. -The following rules will be applied to the pods of this replica set before running the -algorithms for Even Pods Spreading: +Note that the workload author didn't provide spreading rules in the `pod.spec`. +The following spreading rules will be derived from the rules defined in ComponentConfig +and applied at runtime: ```yaml topologySpreadConstraints: - - - maxSkew: 5 + - maxSkew: 5 TopologyKey: "example.com/topology/physical_host" WhenUnsatisfiable: ScheduleAnyway selector: matchLabels: app: demo - - - maxSkew: 15 + - maxSkew: 15 TopologyKey: "example.com/topology/rack" WhenUnsatisfiable: DoNotSchedule selector: @@ -228,7 +226,8 @@ topologySpreadConstraints: app: demo ``` -These rules are internal to the scheduler and they don't get reflected in the apiserver. +Please note that these rules are honored internally in the scheduler, but they are NOT +persisted into PodSpec via API Server. ### Implementation Details @@ -258,11 +257,11 @@ To ensure this feature to be rolled out in high quality. Following tests are man Alpha (v1.17): -[ ] Scheduler Component Config API changes. -[ ] Default, validation and generated code. -[ ] Priority Implementation. -[ ] Predicate implementation. -[ ] Test cases mentioned in the [Test Plan](#test-plan). +- [ ] Scheduler Component Config API changes. +- [ ] Default, validation and generated code. +- [ ] Priority Implementation. +- [ ] Predicate implementation. +- [ ] Test cases mentioned in the [Test Plan](#test-plan). ## Implementation History From 241e9e2da602f412db1145ca5f622ebb66c2e5a2 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Thu, 3 Oct 2019 13:55:07 -0400 Subject: [PATCH 5/8] Clarify relationship with SelectorSpreadingPriority And replace the word `rules` with `constraints`, for consistency. Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 83 ++++++++++--------- 1 file changed, 45 insertions(+), 38 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index c34416b7297..caa672ea558 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -31,6 +31,7 @@ see-also: - [Story 1](#story-1) - [Story 2](#story-2) - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + - [Relationship with SelectorSpreadingPriority](#relationship-with-selectorspreadingpriority) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [API](#api) @@ -42,7 +43,7 @@ see-also: - [Test Plan](#test-plan) - [Graduation Criteria](#graduation-criteria) - [Implementation History](#implementation-history) -- [Alternatives [optional]](#alternatives-optional) +- [Alternatives](#alternatives) ## Release Signoff Checklist @@ -62,27 +63,27 @@ With [Even Pods Spreading](/keps/sig-scheduling/20190221-even-pods-spreading.md) workload authors can define spreading rules for their loads based on the topology of the clusters. The spreading rules are defined in the `PodSpec`, thus they are tied to the pod. -We propose the introduction of configurable default spreading rules, i.e. rules that -can be defined at the cluster level and are applied to pods that don't explicitly define spreading rules. -This way, all pods can be spread according to (likely better informed) rules set by a cluster operator. +We propose the introduction of configurable default spreading constraints, i.e. constraints that +can be defined at the cluster level and are applied to pods that don't explicitly define spreading constraints. +This way, all pods can be spread according to (likely better informed) constraints set by a cluster operator. Workload authors don't need to know the topology of the cluster they will be running on to have their pods spread. -But if they do, they can still set their own spreading rules if they have specific needs. +But if they do, they can still set their own spreading constraints if they have specific needs. ## Motivation In order for a workload (pod) to use `EvenPodsSpreadPriority`: 1. Authors have to have an idea of the underlying topology. -1. PodSpecs become less portable if their spreading rules are tailored to a specific topology. +1. PodSpecs become less portable if their spreading constraints are tailored to a specific topology. On the other hand, cluster operators know the underlying topology of the cluster, which makes -them suitable to provide default spreading rules for all workloads in their cluster. +them suitable to provide default spreading constraints for all workloads in their cluster. ### Goals -- Cluster operators can define default spreading rules for pods that don't provide any +- Cluster operators can define default spreading constraints for pods that don't provide any `pod.spec.topologySpreadConstraints`. -- Workloads are spread with the default rules if they belong to the same service, replication controller, +- Workloads are spread with the default constraints if they belong to the same service, replication controller, replica set or stateful set, and if they don't define `pod.spec.topologySpreadConstraints`. ### Non-Goals @@ -95,7 +96,7 @@ them suitable to provide default spreading rules for all workloads in their clus #### Story 1 -As a cluster operator, I want to set default spreading rules for workloads in the cluster. +As a cluster operator, I want to set default spreading constraints for workloads in the cluster. Currently, `SelectorSpreadPriority` provides a canned priority that spreads across nodes and zones (`failure-domain.beta.kubernetes.io/zone`). However, the nodes in my cluster have custom topology keys (for physical host, rack, etc.). @@ -108,12 +109,25 @@ As a workload author, I want to spread the workload in the cluster, but: ### Implementation Details/Notes/Constraints -Note that a priority given by Default `topologySpreadConstraints` could conflict with the priority -given by `SelectorSpreadingPriority`. -Operators can disable `SelectorSpreadingPriority` or give more precedence to `EvenPodsSpreadPriority`. -But once this KEP graduates to GA, there should be no need for `SelectorSpreadingPriority` -and it can be replaced by an equivalent k8s default to Default `topologySpreadConstraints`. +#### Relationship with SelectorSpreadingPriority + +Note that Default `topologySpreadConstraints` has a similar effect to `SelectorSpreadingPriority`. +Given that the latter is not configurable, they could return conflicting priorities, which +may not be the intention of an operator. On the other hand, a proper Default for `topologySpreadConstraints` +could provide the same priority as `SelectorSpreadingPriority`. Thus, there's no need for the +features to co-exist. + +Give that we guard Default `topologySpreadConstraints` behind a feature flag, +these would be its semantics: + +- If the feature is enabled, `SelectorSpreadingPriority` is removed from the default set of priorities. + K8s provides the Default `topologySpreadConstraints` that matches the priority given by + `SelectorSpreading` if the cluster operator doesn't specify one. +- If the cluster operator provides a Policy that includes `SelectorSpreadingPriority` and + `EvenPodsSpreadPriority`, K8s provides an empty Default `topologySpreadConstraints`. + The cluster operator can still specify Default `topologySpreadConstraints`, + in which case both priorities run. ### Risks and Mitigations @@ -121,7 +135,7 @@ and it can be replaced by an equivalent k8s default to Default `topologySpreadCo feature get minimally affected. After Default `topologySpreadConstraints` is rolled out, all pods will run through the algorithms. However, we should ensure that the running overhead is not significantly higher than -`SelectorSpreadingPriority` if using the k8s default. +`SelectorSpreadingPriority` with k8s Default `topologySpreadConstraints`. ## Design Details @@ -132,31 +146,24 @@ A new structure called `TopologySpreadConstraint` is introduced to `KubeSchedule ```go type KubeSchedulerConfiguration struct { .... - // DefaultTopologySpreadConstraints defines spreading constraints to be applied to pods - // that don't define any. - // If not specified, the scheduler applies the following default: + // TopologySpreadConstraints defines topology spread constraints to be applied to pods + // that don't define any in `pod.spec.topologySpreadConstraints`. Pod selectors must + // be empty, as they are deduced from the resources that the pod belongs to + // (includes services, replication controllers, replica sets and stateful sets). + // If not specified, the scheduler applies the following default constraints: // // +optional - DefaultTopologySpreadConstraints []TopologySpreadConstraint + TopologySpreadConstraints []corev1.TopologySpreadConstraint .... } - -// TopologySpreadConstraint specifies how to spread pods among the given topology. -// Pod selectors are deduced from the resource definitions that the pod belongs to -// (includes services, controllers, replica sets and stateful sets). -type TopologySpreadConstraint struct { - MaxSkew int32 - TopologyKey string - WhenUnsatisfiable corev1.UnsatisfiableConstraintAction -} ``` -Note that `TopologySpreadConstraint` is similar to `k8s.io/api/core/v1.TopologySpreadConstraint`, -except that it doesn't define selectors. +Note the use of `k8s.io/api/core/v1.TopologySpreadConstraint`. During validation, +we verify that selectors are not defined. ### Default rules -These will be the default rules for the cluster when the operator doesn't provide any: +These will be the default constraints for the cluster when the operator doesn't provide any: ```yaml defaultTopologySpreadConstraints: @@ -206,9 +213,9 @@ spec: image: example.com/registry/demo:latest ``` -Note that the workload author didn't provide spreading rules in the `pod.spec`. -The following spreading rules will be derived from the rules defined in ComponentConfig -and applied at runtime: +Note that the workload author didn't provide spreading constraints in the `pod.spec`. +The following spreading constraints will be derived from the constraints defined in ComponentConfig, +and will be applied at runtime: ```yaml topologySpreadConstraints: @@ -226,8 +233,8 @@ topologySpreadConstraints: app: demo ``` -Please note that these rules are honored internally in the scheduler, but they are NOT -persisted into PodSpec via API Server. +Please note that these constraints are honored internally in the scheduler, but they are NOT +persisted in the PodSpec via API Server. ### Implementation Details @@ -267,7 +274,7 @@ Alpha (v1.17): - 2019-09-26: Initial KEP sent out for review. -## Alternatives [optional] +## Alternatives - Make the topology keys used in `SelectorSpreadingPriority` configurable. From 4d55c622050b22efeeee604671095c3cc010a71c Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Fri, 4 Oct 2019 13:39:01 -0400 Subject: [PATCH 6/8] Update goals and alternatives Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index caa672ea558..4eef12cae6a 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -85,10 +85,13 @@ them suitable to provide default spreading constraints for all workloads in thei `pod.spec.topologySpreadConstraints`. - Workloads are spread with the default constraints if they belong to the same service, replication controller, replica set or stateful set, and if they don't define `pod.spec.topologySpreadConstraints`. +- Provide a k8s default for `topologySpreadConstraints` that produces a priority equivalent to + `SelectorSpreadPriority`, so that this algorithm can be removed from the default algorithms' provider. ### Non-Goals -- Removal of `SelectorSpreadPriority`, `ServiceSpreadingPriority` or `ServiceAntiAffinity` priorities. +- Set defaults for specific namespaces or according to other selectors. +- Removal of `ServiceSpreadingPriority` or `ServiceAntiAffinity` priorities. ## Proposal @@ -281,4 +284,11 @@ Alpha (v1.17): While this moves the scheduler in the right direction, there are two problems: 1. We can only support one topology key. - 1. It makes it hard for pods to override the operator-provided spreading rules. \ No newline at end of file + 1. It makes it hard for pods to override the operator-provided spreading rules. + +- Implement a mutating controller that sets defaults. + + This approach would likely allow us to provide a more flexible interface that + can set defaults for specific namespaces or with other selectors. However, that + wouldn't allow us to replace `SelectorSpreadingPriority` with + `EvenPodsSpreading`. \ No newline at end of file From 6cb66a5fcb328113fbda4cc0dc295c56ed553bf8 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Mon, 7 Oct 2019 12:48:23 -0400 Subject: [PATCH 7/8] Specify semantics of SelectorSpreadingPriority Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 26 ++++++++++++------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index 4eef12cae6a..ac1a2f3c448 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -121,16 +121,22 @@ may not be the intention of an operator. On the other hand, a proper Default for could provide the same priority as `SelectorSpreadingPriority`. Thus, there's no need for the features to co-exist. -Give that we guard Default `topologySpreadConstraints` behind a feature flag, -these would be its semantics: - -- If the feature is enabled, `SelectorSpreadingPriority` is removed from the default set of priorities. - K8s provides the Default `topologySpreadConstraints` that matches the priority given by - `SelectorSpreading` if the cluster operator doesn't specify one. -- If the cluster operator provides a Policy that includes `SelectorSpreadingPriority` and - `EvenPodsSpreadPriority`, K8s provides an empty Default `topologySpreadConstraints`. - The cluster operator can still specify Default `topologySpreadConstraints`, - in which case both priorities run. +K8s will set Default `topologySpreadConstraints` and remove `SelectorSpreadingPriority` +from the k8s providers (`DefaultProvider` and `ClusterAutoscalerProvider`). The set +[default](#default-rules) will have a similar effect. + +If an operator sets a Policy, these are the semantics of the presence of `SelectorSpreadingPriority` +and/or `EvenPodsSpreadPriority`: + +| SelectorSpreading | EvenPodsSpread | Valid | Pod spread constraints | +| :---------------: | :------------: | :---: | :---------------------------------------: | +| N | Y | Yes | provided or [k8s default](#default-rules) | +| Y | Y | Yes | provided or [k8s default](#default-rules) | +| N | N | Yes | None | +| Y | N | No | - | + +Selecting `SelectorSpreadingPriority` but not `EvenPodsSpreadPriority` in a policy is an invalid +configuration, because the latter is a requirement for the former. ### Risks and Mitigations From d36b5633625a90f4d1d173210ec1e8a67228f5f4 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Mon, 7 Oct 2019 13:21:29 -0400 Subject: [PATCH 8/8] Specify case when operator provides constraints Signed-off-by: Aldo Culquicondor --- .../20190926-default-even-pod-spreading.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/keps/sig-scheduling/20190926-default-even-pod-spreading.md b/keps/sig-scheduling/20190926-default-even-pod-spreading.md index ac1a2f3c448..9364c651979 100644 --- a/keps/sig-scheduling/20190926-default-even-pod-spreading.md +++ b/keps/sig-scheduling/20190926-default-even-pod-spreading.md @@ -128,12 +128,14 @@ from the k8s providers (`DefaultProvider` and `ClusterAutoscalerProvider`). The If an operator sets a Policy, these are the semantics of the presence of `SelectorSpreadingPriority` and/or `EvenPodsSpreadPriority`: -| SelectorSpreading | EvenPodsSpread | Valid | Pod spread constraints | -| :---------------: | :------------: | :---: | :---------------------------------------: | -| N | Y | Yes | provided or [k8s default](#default-rules) | -| Y | Y | Yes | provided or [k8s default](#default-rules) | -| N | N | Yes | None | -| Y | N | No | - | +| SelectorSpreading | EvenPodsSpread | Operator constraints | Valid | Pod spread constraints | +| :---------------: | :------------: | :------------------: | :---: | :---------------------------: | +| N | Y | Not provided | Yes | [k8s default](#default-rules) | +| N | Y | Provided | Yes | provided by operator | +| Y | Y | Not provided | Yes | [k8s default](#default-rules) | +| Y | Y | Provided | No | - | +| N | N | - | Yes | None | +| Y | N | - | No | - | Selecting `SelectorSpreadingPriority` but not `EvenPodsSpreadPriority` in a policy is an invalid configuration, because the latter is a requirement for the former.