-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KEP for default Even Pods Spreading #1260
Add KEP for default Even Pods Spreading #1260
Conversation
Signed-off-by: Aldo Culquicondor <acondor@google.com>
/cc @Huang-Wei @ahg-g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easy to implement this with runtime class scheduling, we could add another field to Scheduling
struct like this:
type Scheduling struct {
// nodeSelector lists labels that must be present on nodes that support this
// RuntimeClass. Pods using this RuntimeClass can only be scheduled to a
// node matched by this selector. The RuntimeClass nodeSelector is merged
// with a pod's existing nodeSelector. Any conflicts will cause the pod to
// be rejected in admission.
// +optional
NodeSelector map[string]string
// tolerations adds tolerations to pods running with this RuntimeClass.
// +optional
Tolerations []corev1.Toleration
// TopologySpreadConstraints describes how a group of pods are spread
// If specified, scheduler will enforce the constraints
// +optional
TopologySpreadConstraints []TopologySpreadConstraint // new field
}
The pod with specified runtime class would use the topology spread constraints in the Scheduling
.
cc/ @tallclair
|
b51b218
to
37e2dfd
Compare
Signed-off-by: Aldo Culquicondor <acondor@google.com>
37e2dfd
to
cc1cb5c
Compare
Signed-off-by: Aldo Culquicondor <acondor@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alculquicondor ! LGTM generally, some comments below.
With `EvenPodsSpreading`, workload authors can define spreading rules for their loads based on the | ||
topology of the clusters. | ||
|
||
By introducing configurable default spreading rules, workloads can be spread in the topology of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should reword this section to highlight that EvenPodsSpreading enables users to control pod spreading on a PodSpec basis, while this feature will bring a cluster-level configurable pod spreading policy, and then mention a bit on the differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded the 2 paragraphs.
#### Story 1 | ||
|
||
As a cluster operator, I want to set default spreading rules for workloads in the cluster. | ||
Currently, `SelectorSpreadPriority` provides a canned priority that spreads across nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"nodes and zones", or just "zones"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both. It gives a weight of 1/3 to nodes and 2/3 to zones.
`SelectorSpreadingPriority`. | ||
Operators can disable `SelectorSpreadingPriority`. But once default rules for `EvenPodsSpreading` is GA, | ||
we can consider removing `SelectorSpreadingPriority` and replacing it by an equivalent | ||
k8s default to the default rules for `EvenPodsSpreading`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to present this KEP as the replacement for the SelectorSpreadingPriority, they shouldn't exist together. To make the transition frictionless, we can set a default value for what we are proposing here such that the scheduler's default behavior isn't changed (i.e., we set a default configuration for "default even pods spreading" to spread across zones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what this paragraph is trying to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that we shouldn't allow configuring both at the same time at all. The paragraph is still talking about the possibility of that being the case.
Currently, SelectorSpreadingPriority is used as one of the priority functions when no Policy is specified, and it is a value that we accept in the Policy file. Those two aspects are our guarantees to the users. We can solve them by doing the following if this feature is enabled:
-
SelectorSpreadingPriority is removed from the default set of priorities. The default configuration for cluster-wide EvenPodsSpreading is set such that we maintain the same current scheduler default behavior that SelectorSpreadingPriority offers (or as close as possible).
-
During the deprecation period of Policy (and by association SelectorSpreadingPriority), we say when set, SelectorSpreadingPriority overrides cluster-wide EvenPodsSpreading.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that they are conflicting. See updated section.
// that don't define any. | ||
// If not specified, the scheduler applies the following default. | ||
// +optional | ||
DefaultTopologySpreadConstraints []TopologySpreadConstraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the prefix Default, I think it is not necessary. It will be confusing when we start talking about a default value for DefaultTopologySpreadConstraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's confusing. But we need to convey the idea that it's still a cluster default that is applied if the pods don't specify any rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Component config is a cluster-wide configuration by definition, adding a "Default" prefix does not really clarify the semantics of this configuration variable as you state it, at the end of the day it has to be explained in the docs/comments that this value can be overridden at the pod level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated name and comment.
} | ||
``` | ||
|
||
Note that `TopologySpreadConstraint` is similar to `k8s.io/api/core/v1.TopologySpreadConstraint`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As current
TopologySpreadConstraint.LabelSelector
is optional, you can reuse that structure. (validation logic can help detect illegal presence of labelSelector)
@Huang-Wei What happens currently in EvenPodsSpreading when the label selector is not specified?
Signed-off-by: Aldo Culquicondor <acondor@google.com>
`SelectorSpreadingPriority`. | ||
Operators can disable `SelectorSpreadingPriority`. But once default rules for `EvenPodsSpreading` is GA, | ||
we can consider removing `SelectorSpreadingPriority` and replacing it by an equivalent | ||
k8s default to the default rules for `EvenPodsSpreading`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that we shouldn't allow configuring both at the same time at all. The paragraph is still talking about the possibility of that being the case.
Currently, SelectorSpreadingPriority is used as one of the priority functions when no Policy is specified, and it is a value that we accept in the Policy file. Those two aspects are our guarantees to the users. We can solve them by doing the following if this feature is enabled:
-
SelectorSpreadingPriority is removed from the default set of priorities. The default configuration for cluster-wide EvenPodsSpreading is set such that we maintain the same current scheduler default behavior that SelectorSpreadingPriority offers (or as close as possible).
-
During the deprecation period of Policy (and by association SelectorSpreadingPriority), we say when set, SelectorSpreadingPriority overrides cluster-wide EvenPodsSpreading.
WDYT?
I agree with @alculquicondor, the RuntimeClass scheduling constraints are only intended to address compatibility. A more relevant proposal would the SchedulingPolicy, which unfortunately hasn't gotten off the ground yet. |
I updated the PR with the provided feedback. However, I'm taking a look at the progress of SchedulingPolicy. If there a no major blockers, I might merge the efforts. |
+1 to considering SchedulingPolicy again, it could allow us to deprecate most scheduler's "custom predicates/priorities". |
0ff98ee
to
47bdd56
Compare
And replace the word `rules` with `constraints`, for consistency. Signed-off-by: Aldo Culquicondor <acondor@google.com>
47bdd56
to
241e9e2
Compare
I talked to Yassine and took the job, and I'll send the KEP soon. |
note that I miss understood the current proposal for SchedulingPolicy, in its current form it will not help with deprecating custom predicates/priorities. In any case, we discussed this during the sig meeting today and Aldo will be following up with few people who initially proposed it to understand if this is still worth pursuing especially that what SchedulingPolicy is providing can be implemented as a user admission webhook. |
Signed-off-by: Aldo Culquicondor <acondor@google.com>
After looking at what the Scheduling Policy proposes and the older discussions, I suggest that we proceed with this KEP as is. The rational being:
I updated Goals and Alternatives sections to reflect this suggestion. |
- If the cluster operator provides a Policy that includes `SelectorSpreadingPriority` and | ||
`EvenPodsSpreadPriority`, K8s provides an empty Default `topologySpreadConstraints`. | ||
The cluster operator can still specify Default `topologySpreadConstraints`, | ||
in which case both priorities run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should run both priorities, I think we should clarify that if both are specified, then one overrides the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So your take would be that EvenPodsSpreadPriority
overrides and disables SelectorSpreadingPriority
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a single implementation, a plugin. This plugin will either be configured using the priority-to-plugin translation logic we have if SelectorSpreadingPriority
is specified, otherwise it is configured using the new ComponentConfig parameters you will be adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated clarifications
18a12eb
to
3af5a33
Compare
Signed-off-by: Aldo Culquicondor <acondor@google.com>
3af5a33
to
6cb66a5
Compare
| SelectorSpreading | EvenPodsSpread | Valid | Pod spread constraints | | ||
| :---------------: | :------------: | :---: | :---------------------------------------: | | ||
| N | Y | Yes | provided or [k8s default](#default-rules) | | ||
| Y | Y | Yes | provided or [k8s default](#default-rules) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the constraint should be k8s default always, never provided in this case because SelectorSpreading takes precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can consider specifying both (topologySpreadConstraint and SelectorSpreading) an invalid configuration, which I think is better since we don't have to make a call which one takes precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the second option
Signed-off-by: Aldo Culquicondor <acondor@google.com>
|
||
### Graduation Criteria | ||
|
||
Alpha (v1.17): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should say that this feature is tired to even pods spread, so they will graduate together.
/lgtm This is great, thanks Aldo for the detailed explanation of how this will work. I will approve one @Huang-Wei LGTMs |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
sorry for the premature merge :( Aldo, can you please clarify the graduation dependency between this and EvenPodsSpreading in a followup PR? |
Sure thing. There is a new PR in place. |
- **Unit Tests**: All core changes must be covered by unit tests. | ||
- **Integration Tests**: One integration test for the default rules and one for custom rules. | ||
- **Benchmark Tests**: A benchmark test that compare the default rules against `SelectorSpreadingPriority`. | ||
The performance should be as close as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for jumping late into this discussion - but do we believe we can do that efficiently?
I'm asking in the context of large clusters - i.e. 5k nodes.
I want to ensure that we won't introduce a visible regression, because this may become a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to do this as efficiently as the current logic, I don't see why we can't. I agree that it is a blocker if this will be causing performance regressions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SelectorSpreadingPriority
is currently part of the default algorithms that run. Theoretically, we are talking about the same problem, and then it should be possible to devise an algorithm for EvenPodsSpreadPriority
with the same complexity.
We will be backing it with performance measurements, as seen in the Test Plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor perhaps we should explicitly call that out in the KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Test Plan is part of the graduation criteria.
Done: Summary, Motivation, Proposal, Design Details (Alpha Graduation Criteria), Alternatives.
Feature: #1258
Main repo issue: kubernetes/kubernetes#80639