Add RemoveTooManyRestarts policy #254

damemi · 2020-03-30T19:03:04Z

This rebases the changes from #89 to the current master branch

See issue #62

damemi · 2020-03-30T19:43:22Z

/cc @ingvagabund
ptal

seanmalloy · 2020-03-31T05:08:19Z

/kind feature

pkg/descheduler/strategies/toomanyrestarts.go

README.md

examples/policy.yaml

pkg/api/types.go

pkg/api/v1alpha1/types.go

ingvagabund

Also, the test descriptions need to be improved.

ingvagabund · 2020-03-31T11:19:44Z

pkg/descheduler/strategies/toomanyrestarts.go

+				continue
+			}
+
+			glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts)


s/evicted/evict

pkg/descheduler/strategies/toomanyrestarts.go

ingvagabund · 2020-03-31T11:22:36Z

pkg/descheduler/strategies/toomanyrestarts.go

+			glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts)
+			success, err := evictions.EvictPod(ds.Client, pod, policyGroupVersion, ds.DryRun)
+			if !success {
+				glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err)


s/(%#v)/(%v) as error is a simple string at the end

I don't disagree, but all of our other policies format the errors with (%#v) so for consistency in error reporting we should probably keep it. (Personally I prefer %+v for errors)

Good point. No more against keeping %#v. It's better to change all the occurrences at once.

ingvagabund · 2020-03-31T11:22:40Z

pkg/descheduler/strategies/toomanyrestarts.go

+				glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err)
+			} else {
+				nodePodCount[node]++
+				glog.V(1).Infof("RemovePodsHavingTooManyRestarts Evicted pod: %#v (%#v)", pod.Name, err)


s/(%#v)/(%v) as error is a simple string at the end

same as #254 (comment)

ingvagabund · 2020-03-31T11:23:55Z

pkg/descheduler/strategies/toomanyrestarts.go

+	}
+
+	for _, node := range nodes {
+		glog.V(1).Infof("RemovePodsHavingTooManyRestarts Processing node: %#v", node.Name)


No need to prefix the messages with RemovePodsHavingTooManyRestarts as glog/klog prints the file name which uniquely identifies the strategy. Holds for other lines as well.

Also node.Name is a string, so s/%#v/%v or %q

Is %s fine?

This is also something that's copied in all the other strategies (using %#v for nodename here)

No more against keeping %#v. It's better to change all the occurrences at once.

ingvagabund · 2020-03-31T11:53:35Z

pkg/descheduler/strategies/toomanyrestarts_test.go

+		i++
+	}
+
+	// The following 4 pods won't get evicted.


The following 3, not 4

ingvagabund · 2020-03-31T11:55:44Z

pkg/descheduler/strategies/toomanyrestarts_test.go

+		pod := test.BuildTestPod(fmt.Sprintf("pod-%d", i), 100, 0, node.Name)
+		pod.ObjectMeta.OwnerReferences = test.GetNormalPodOwnerRefList()
+
+		// pod i will has 25 * i restarts.


pod at index i has ...

ingvagabund · 2020-03-31T11:59:02Z

pkg/descheduler/strategies/toomanyrestarts_test.go

+			maxPodsToEvict:          0,
+		},
+		{
+			description:             "One pod have total restarts bigger than threshold, 6 pod evictions",


"Some pods have total restarts bigger than threshold".

ingvagabund · 2020-03-31T12:03:41Z

pkg/descheduler/strategies/toomanyrestarts.go

+// calcContainerRestarts get container restarts and init container restarts.
+func calcContainerRestarts(pod *v1.Pod) (int32, int32) {
+	var (
+		restarts     int32 = 0


No need for 0 as it's the default value:

var restarts, initRestarts int32

ingvagabund · 2020-03-31T12:14:39Z

pkg/descheduler/strategies/toomanyrestarts_test.go

+			maxPodsToEvict:          0,
+		},
+		{
+			description:             "Nine pods have total restarts equals threshold(includingInitContainers=true), 5 pods evictions",


The strategy has the following condition for deciding if the threshold is met:

if strategy.Params.PodsHavingTooManyRestarts.IncludingInitContainers { if restarts+initRestarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds { continue } } else if restarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds { continue }

Which means a pod is evicted only when the number of restarts is bigger than the threshold. Is this the intention? I would assume a pod to be evicted as soon as the threshold is met.

damemi · 2020-03-31T16:37:31Z

@seanmalloy @ingvagabund thanks, I've addressed your feedback

README.md

pkg/descheduler/strategies/toomanyrestarts.go

damemi · 2020-04-01T15:50:04Z

@seanmalloy thanks, updated. I'll squash down to 1 commit later today if there's no more feedback

seanmalloy · 2020-04-01T17:59:14Z

/lgtm

damemi · 2020-04-01T18:06:15Z

squashed to 1 commit

seanmalloy · 2020-04-01T18:52:52Z

/lgtm

seanmalloy · 2020-04-01T18:53:16Z

/assign @aveshagarwal @ravisantoshgudimetla

aveshagarwal · 2020-04-01T19:10:27Z

just curious why it includes so many vendor changes. Did it update some existing dependencies.
Also please split the commit into 2 commits, one with main changes and other related to vendor changes as it helps with the review.

damemi · 2020-04-24T15:32:08Z

@seanmalloy thanks, rebased, and got the travis passing. Good for another look now

seanmalloy · 2020-04-27T05:32:11Z

/lgtm

seanmalloy · 2020-04-27T17:23:13Z

@aveshagarwal and @ravisantoshgudimetla please take a look when you have some time. This PR is ready for you to review again.

aveshagarwal · 2020-04-27T17:30:39Z

sorry if it has been discussed before. Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes? I am concerned if it does not happen that descheduler might go into loop?

ingvagabund · 2020-04-27T17:39:17Z

Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes?

With some likelihood, yes.

I am concerned if it does not happen that descheduler might go into loop?

As long as the strategy is run once in a while, it will not.

damemi · 2020-04-27T18:07:27Z

@aveshagarwal the strategy itself doesn't explicitly place failed pods onto other nodes, that is still up to the owning controller to recreate the pod and the scheduler to place it. But the goal is that when the scheduler picks up the "stuck" pod again, it will make a better decision this time.

If the pod does land on the same node again, it could cause a loop. But like @ingvagabund said this depends on the frequency the descheduler is being run. Ultimately though the pod is already in a CrashLoop anyway, so this will at least make an attempt to un-stick it.

aveshagarwal · 2020-04-27T18:13:20Z

Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes?

With some likelihood, yes.

Actually that likeliood is one of the main goals behind each descheduler strategy and what makes, among other things, descheduler useful. The idea is to have likelihood as high as possible so that descheduler really helps with its balancing act. If likelihood is low then the strategy is not going to be much helpful. And that is my main concern with this strategy.

Could we do something to increase the likelihood of this strategy? Right now it seems to me that likelihood is low. Please let me know if you think otherwise.

I am concerned if it does not happen that descheduler might go into loop?

As long as the strategy is run once in a while, it will not.

aveshagarwal · 2020-04-27T18:15:49Z

@aveshagarwal the strategy itself doesn't explicitly place failed pods onto other nodes, that is still up to the owning controller to recreate the pod and the scheduler to place it.

I am aware of that. I am just wondering as I mentioned in my previous comment to Jan that is there anyway to increase the likelihood of success for this strategy. Because if likelihood is low then this strategy might not be much helpful.

But the goal is that when the scheduler picks up the "stuck" pod again, it will make a better decision this time.

If the pod does land on the same node again, it could cause a loop. But like @ingvagabund said this depends on the frequency the descheduler is being run. Ultimately though the pod is already in a CrashLoop anyway, so this will at least make an attempt to un-stick it.

ingvagabund · 2020-04-27T18:25:33Z

Actually that likeliood is one of the main goals behind each descheduler strategy and what makes, among other things, descheduler useful. The idea is to have likelihood as high as possible so that descheduler really helps with its balancing act. If likelihood is low then the strategy is not going to be much helpful. And that is my main concern with this strategy.

Some strategies test if there's at least one additional node that can schedule evicted pod (e.g. taints are tolerated, node set pointed by a node selector is not empty, etc.). However, strategies do not keep a state. We might make a cache of pods for a given strategy, check if the pod was evicted from the same node again and "try" to make a better decision. Though, as long as scheduler is not aware of pod's re-incarnation history, it still will be random.

@aveshagarwal you could use the same argument for any of already existing strategies. As long as re-incarnated pods do not have additional information for the scheduler, I don't think we can do anything about the likelihood.

aveshagarwal · 2020-04-27T18:41:43Z

@aveshagarwal you could use the same argument for any of already existing strategies. As long as re-incarnated pods do not have additional information for the scheduler, I don't think we can do anything about the likelihood.

I agree. But I'd say that with other strategies, likelihood is still very high the way they function.

If you look at the reasons for toomanyrestart errors, in the issue #62 this PR is trying to address, you could notice that there are cases like application bug etc where this blanket strategy might not be much helpful.

I can approve it if you guys would like to go ahead with this PR as it is. I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

ingvagabund · 2020-04-27T19:23:00Z

If you look at the reasons for toomanyrestart errors, in the issue #62 this PR is trying to address, you could notice that there are cases like application bug etc where this blanket strategy might not be much helpful.

I agree. Container restarts count does not say anything about the cause. Also, have consumers decide when the strategy is helpful is not the right justification. We still need to provide a stable solution that will have more positive than negative effects in a cluster. Minimizing disturbance, etc. What can happen in the worst case? Assuming an application has a bug causing the pod to panic (e.g. right away), the pod will enter the crash loop. IINM, kubelet has a mechanism to backoff crashing containers. So, in the worst case the scheduler will schedule the pod again and again. However, the scheduler will not enter an active loop since it backoffs as well. The only drawback I can see now is the pod will never gets to run properly. Though, nor the kubelet, nor the scheduler will enter a downward spiral even when the descheduler period is set to 1s (in which case every strategy will be causing havoc in the cluster anyway).

ingvagabund · 2020-04-27T19:29:48Z

I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

I don't think any pod instance has information that can help increase the likelihood. Only kubelet knows the reason why given container failed. Also, we don't consume any third party diagnostics (e.g. node problem detector). All the strategies are quite naive and simple in their decision making.

aveshagarwal · 2020-04-27T19:45:56Z

I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

I don't think any pod instance has information that can help increase the likelihood. Only kubelet knows the reason why given container failed. Also, we don't consume any third party diagnostics (e.g. node problem detector). All the strategies are quite naive and simple in their decision making.

thanks @ingvagabund for your explanation.
/approve

k8s-ci-robot · 2020-04-27T19:46:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aveshagarwal, damemi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [aveshagarwal]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating

Add RemoveTooManyRestarts policy

Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 30, 2020

k8s-ci-robot requested review from aveshagarwal and k82cn March 30, 2020 19:03

damemi mentioned this pull request Mar 30, 2020

add remove too many restarts policy #89

Closed

damemi force-pushed the toomanyrestarts branch from f484ad5 to 9fce874 Compare March 30, 2020 19:40

k8s-ci-robot requested a review from ingvagabund March 30, 2020 19:43

damemi force-pushed the toomanyrestarts branch from 9fce874 to 1eb3c52 Compare March 30, 2020 20:41

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 31, 2020

seanmalloy requested changes Mar 31, 2020

View reviewed changes

ingvagabund requested changes Mar 31, 2020

View reviewed changes

damemi force-pushed the toomanyrestarts branch from 99bcf5b to f9fa253 Compare March 31, 2020 16:37

seanmalloy requested changes Apr 1, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

pkg/descheduler/strategies/toomanyrestarts.go Outdated Show resolved Hide resolved

damemi force-pushed the toomanyrestarts branch from 7e5075b to 2f03ee9 Compare April 1, 2020 15:49

k8s-ci-robot assigned seanmalloy Apr 1, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020

damemi force-pushed the toomanyrestarts branch from 2f03ee9 to 99d1080 Compare April 1, 2020 18:05

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020

k8s-ci-robot assigned aveshagarwal and ravisantoshgudimetla Apr 1, 2020

damemi force-pushed the toomanyrestarts branch 3 times, most recently from 2bd5bf5 to 7511ac9 Compare April 24, 2020 14:22

damemi added 2 commits April 24, 2020 10:48

Add RemovePodsHavingTooManyRestarts strategy

e7c4279

make gen && go mod vendor

c2d7e22

damemi force-pushed the toomanyrestarts branch from 7511ac9 to c2d7e22 Compare April 24, 2020 14:48

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2020

k8s-ci-robot merged commit 91de471 into kubernetes-sigs:master Apr 27, 2020

seanmalloy mentioned this pull request Apr 27, 2020

deschedule pods that fail to start or restart too often #62

Closed

damemi mentioned this pull request May 6, 2020

Bug 1833329: Add support for RemovePodsHavingTooManyRestarts strategy openshift/cluster-kube-descheduler-operator#108

Merged

seanmalloy mentioned this pull request May 20, 2020

Kubernetes 1.19 Release Cycle #284

Closed

4 tasks

ingvagabund mentioned this pull request Jun 4, 2020

Add myself to project reviewers #317

Merged

briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022

Merge pull request kubernetes-sigs#254 from damemi/toomanyrestarts

1e9a160

Add RemoveTooManyRestarts policy

Add RemoveTooManyRestarts policy #254

Add RemoveTooManyRestarts policy #254

Conversation

damemi commented Mar 30, 2020

damemi commented Mar 30, 2020

seanmalloy commented Mar 31, 2020

ingvagabund left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Mar 31, 2020

damemi commented Apr 1, 2020

seanmalloy commented Apr 1, 2020

damemi commented Apr 1, 2020

seanmalloy commented Apr 1, 2020

seanmalloy commented Apr 1, 2020

aveshagarwal commented Apr 1, 2020 • edited Loading

damemi commented Apr 24, 2020

seanmalloy commented Apr 27, 2020

seanmalloy commented Apr 27, 2020

aveshagarwal commented Apr 27, 2020

ingvagabund commented Apr 27, 2020 • edited Loading

damemi commented Apr 27, 2020

aveshagarwal commented Apr 27, 2020

aveshagarwal commented Apr 27, 2020 • edited Loading

ingvagabund commented Apr 27, 2020 • edited Loading

aveshagarwal commented Apr 27, 2020

ingvagabund commented Apr 27, 2020

ingvagabund commented Apr 27, 2020

aveshagarwal commented Apr 27, 2020

k8s-ci-robot commented Apr 27, 2020

aveshagarwal commented Apr 1, 2020 •

edited

Loading

ingvagabund commented Apr 27, 2020 •

edited

Loading

aveshagarwal commented Apr 27, 2020 •

edited

Loading

ingvagabund commented Apr 27, 2020 •

edited

Loading