Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parameter for making the thresholds of the LowNodeUtilization strategy relative to average values #473

Conversation

AmoVanB
Copy link
Contributor

@AmoVanB AmoVanB commented Dec 16, 2020

Reason for this PR

Currently, the thresholds that can be set for the LodNodeUtilization strategy only allow to make sure that nodes aren't over or underutilized compared to thresholds relative to their capacities. However, sometimes, it's desirable to make sure all nodes are "similarly" utilized to ensure the proper balance of a cluster. While this is possible to implement using the current thresholds, the values needed to achieve such a balancing behavior would be dependent on the number of nodes, pods, and their memory/CPU requests, which is not very practical to configure, especially for dynamic clusters.

Feature of this PR

This PR enables the descheduler to achieve such a goal. We define an additional useDeviationThresholds boolean parameter (the name is maybe not the best, it can change). If false, the descheduler behaves as now. If true, the thresholds are considered as percentage deviations relative to the average utilizations among all nodes.

For example. Considering only the memory metric. If nodes have utilizations of [10%, 20%, 30%, 20%] and both threshold and targetThreshold are 5%. The average node utilization is (10% + 20% + 30% + 20%) / 4 = 20%. The first node has an utilization of 10%, which is lower than 20% (average) - 5% (threshold). The node is hence considered underutilized. The third node has an utilization of 30%, which is greater than 20% (average) + 5% (targetThreshold). The node is hence considered overutilized. Both other nodes have utilizations within the [20% - 5%, 20% + 5%] window and are hence considered appropriately utilized.

Once nodes are labeled as under- or overutilized, the strategy behaves exactly the same as before.

Code style comment

Since the only change compared to the original strategy is the way nodes are categorized as over- and underutilized, I thought it is best to include that in the existing LowNodeUtilization strategy and not to create a new strategy. Another option would be to create two distinct strategies that share methods. I went for the first option as that was the easiest to implement. I would definitely not be against the second option.

Test

We had this need/problem in our 200+ pods cluster. We deployed the descheduler from this PR with success in our test clusters. It passes our tests (that make sure that the correct pods are evicted when the cluster is unbalanced) and works as expected and without any detected issues for now.

Vision

In general, the proposed feature goes into the direction of a more general intention we have with the descheduler: take the state of the cluster, smartly compute a new schedule for all the pods (not one by one but all at once to leverage the complete knowledge we have) according to a given strategy (ours: balanced utilization), and then implementing the transition from the current state to the desired state. This PR is our first (small) step towards that goal but is there such a plan for the descheduler? Overall, that would turn the descheduler into a "cluster scheduler" in contrast to the current "per-pod scheduler" of Kubernetes. I know there is a plan to eventually incorporate scheduling in the tool and not to rely on the existing scheduler, but not sure if you're looking into such optimization-driven strategies.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 16, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @AmoVanB. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 16, 2020

// Logging specifies the options of logging.
// Refer [Logs Options](https://github.com/kubernetes/component-base/blob/master/logs/options.go) for more information.
Logging componentbaseconfig.LoggingConfiguration `json:"logging,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting a complaint from update-generated-conversions.sh without that. Not sure if it's really needed though.

} else {
lowResourceThreshold = map[v1.ResourceName]*resource.Quantity{
v1.ResourceCPU: resource.NewMilliQuantity(int64(float64(strategyConfig.Thresholds[v1.ResourceCPU])*float64(nodeCapacity.Cpu().MilliValue())*0.01), resource.DecimalSI),
v1.ResourceMemory: resource.NewQuantity(int64(float64(strategyConfig.Thresholds[v1.ResourceMemory])*float64(nodeCapacity.Memory().Value())*0.01), resource.BinarySI),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is resource.BinarySI correct here? I took that from the existing code, but I'm surprised that this is different from resource.DecimalSI for the CPU and pods. Not sure about the purpose/meaning of that parameter though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're different because for CPU, 5 cores = 5 * 1000m, but for Memory, 5GiB = 5 * 1024 MiB.

@@ -103,7 +104,7 @@ func TestLowNodeUtilization(t *testing.T) {
},
n2NodeName: {
Items: []v1.Pod{
*test.BuildTestPod("p9", 400, 0, n1NodeName, test.SetRSOwnerRef),
*test.BuildTestPod("p9", 400, 0, n2NodeName, test.SetRSOwnerRef),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in the 3 following changes, I thought that was a typo as the pod is actually supposed to be on node 2. Not sure if that is either important or correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this changes into a separate PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed on master by this commit.

continue
}

// 2. for each topologySpreadConstraint in that namespace
klog.V(1).InfoS("Processing "+strconv.Itoa(len(namespaceTopologySpreadConstraints))+" constraints", "namespace", namespace.Name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added some logs which I think are very helpful for simple debug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these debug logs should be in this level, you can set it to level 3.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2021
@AmoVanB AmoVanB force-pushed the feature/relative-node-utilization branch from 946d17b to 326d175 Compare January 22, 2021 09:20
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 22, 2021
Copy link
Contributor

@lixiang233 lixiang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AmoVanB. Thanks for your PR.

If we run lowNodeUtilization periodically in a dynamic cluster, we may need to change Thresholds or TargetThresholds once the total utilization of the cluster changes. So it makes sense to make Thresholds and TargetThresholds change with the average resources usage of the cluster.

/ok-to-test
/kind feature

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. kind/feature Categorizes issue or PR as related to a new feature. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 28, 2021
Comment on lines 189 to 190
|`thresholds`|map(string:int)|
|`targetThresholds`|map(string:int)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider to change the name of these thresholds to low and high to make it easier to understand, that's what @ingvagabund suggests here.

Copy link
Contributor

@ingvagabund ingvagabund Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we were discussing renaming the params few times back. Wrt. bumping version of the strategy config as well. E.g. reorganize the data type so we can allow to configure the same strategy multiple times with different configuration.

Though, we can still add new params, deprecate the current ones and remove them after 3 releases.

continue
}

// 2. for each topologySpreadConstraint in that namespace
klog.V(1).InfoS("Processing "+strconv.Itoa(len(namespaceTopologySpreadConstraints))+" constraints", "namespace", namespace.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these debug logs should be in this level, you can set it to level 3.

} else {
lowResourceThreshold = map[v1.ResourceName]*resource.Quantity{
v1.ResourceCPU: resource.NewMilliQuantity(int64(float64(strategyConfig.Thresholds[v1.ResourceCPU])*float64(nodeCapacity.Cpu().MilliValue())*0.01), resource.DecimalSI),
v1.ResourceMemory: resource.NewQuantity(int64(float64(strategyConfig.Thresholds[v1.ResourceMemory])*float64(nodeCapacity.Memory().Value())*0.01), resource.BinarySI),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're different because for CPU, 5 cores = 5 * 1000m, but for Memory, 5GiB = 5 * 1024 MiB.

Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions before a proper review of this PR. I like the idea in overall.

targetThresholds := strategy.Params.NodeResourceUtilizationThresholds.TargetThresholds
if err := validateStrategyConfig(thresholds, targetThresholds); err != nil {
strategyConfig := strategy.Params.NodeResourceUtilizationThresholds
if err := validateStrategyConfig(strategyConfig); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the changes generalizing thresholds and targetThresholds into strategyConfig.thresholds and strategyConfig.targetThresholds into a separate commit. It will be easier to follow the changes.

@@ -114,9 +115,10 @@ func RemovePodsViolatingTopologySpreadConstraint(
(len(excludedNamespaces) > 0 && excludedNamespaces.Has(namespace.Name)) {
continue
}
klog.V(1).InfoS("Processing namespace", "namespace", namespace.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put changes in TSC into its own PR as it's not related to the deviation feature.

@@ -103,7 +104,7 @@ func TestLowNodeUtilization(t *testing.T) {
},
n2NodeName: {
Items: []v1.Pod{
*test.BuildTestPod("p9", 400, 0, n1NodeName, test.SetRSOwnerRef),
*test.BuildTestPod("p9", 400, 0, n2NodeName, test.SetRSOwnerRef),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this changes into a separate PR.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AmoVanB
To complete the pull request process, please assign damemi after the PR has been reviewed.
You can assign the PR to them by writing /assign @damemi in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2021
@k8s-ci-robot
Copy link
Contributor

@AmoVanB: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ingvagabund
Copy link
Contributor

We should get #434 merged first though.

@ingvagabund
Copy link
Contributor

@AmoVanB #434 merged

@AmoVanB
Copy link
Contributor Author

AmoVanB commented May 8, 2021

I'm not working on that anymore. I tried to rebase but there are a couple of non-obvious conflicts. I unfortunately won't have time to work on rebasing and fixing that. Hopefully someone can take over! :)

@matthieu-eck
Copy link

Thanks @AmoVanB, I will take over from here ;) @ingvagabund, Shall I open another PR with the required changes?

@ingvagabund
Copy link
Contributor

@matthieu-eck by all means. Thank you for continuing the work @AmoVanB started!!!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2022
@k8s-ci-robot
Copy link
Contributor

@AmoVanB: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-test-e2e-k8s-master-1-21 326d175 link /test pull-descheduler-test-e2e-k8s-master-1-21
pull-descheduler-helm-test 326d175 link /test pull-descheduler-helm-test
pull-descheduler-test-e2e-k8s-master-1-22 326d175 link /test pull-descheduler-test-e2e-k8s-master-1-22
pull-descheduler-test-e2e-k8s-1-21-1-21 326d175 link /test pull-descheduler-test-e2e-k8s-1-21-1-21
pull-descheduler-unit-test-master-master 326d175 link true /test pull-descheduler-unit-test-master-master
pull-descheduler-test-e2e-k8s-master-1-23 326d175 link true /test pull-descheduler-test-e2e-k8s-master-1-23

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 1, 2022
HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 1, 2022
HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 1, 2022
HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 2, 2022
HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 2, 2022
…sigs#473

feat(LowNodeUtilization): useDeviationThresholds, redo of kubernetes-sigs#473

feat(LowNodeUtilization): useDeviationThresholds, redo of kubernetes-sigs#473
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2022
HelmutLety pushed a commit to HelmutLety/descheduler that referenced this pull request Mar 17, 2022
HelmutLety added a commit to HelmutLety/descheduler that referenced this pull request Mar 17, 2022
HelmutLety added a commit to HelmutLety/descheduler that referenced this pull request Mar 28, 2022
…sigs#473

[751]: normalize Percentage in nodeutilization and clean the tests
k8s-ci-robot added a commit that referenced this pull request Mar 28, 2022
feat: Add DeviationThreshold Paramter for LowNodeUtilization, (Previous attempt - #473 )
@ingvagabund
Copy link
Contributor

Closing in favor of #751
/close

@k8s-ci-robot
Copy link
Contributor

@ingvagabund: Closed this PR.

In response to this:

Closing in favor of #751
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants