-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add remove too many restarts policy #89
add remove too many restarts policy #89
Conversation
Can one of the admins verify this patch? |
4d08495
to
d8ccf67
Compare
@liubin thanks for this PR. I will review it soon. |
/ok-to-test |
|
} | ||
} else if restarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init container is a special case of a container, so the calcContainerRestarts
could be defined as:
func calcContainerRestarts(pod *v1.Pod, countInitContainers bool) int32 {
var restarts int32 = 0
for _, cs := range pod.Status.ContainerStatuses {
restarts += cs.RestartCount
}
if countInitContainers {
for _, cs := range pod.Status.InitContainerStatuses {
restarts += cs.RestartCount
}
}
return restarts
}
which simplifies the code to:
restarts := calcContainerRestarts(pod, params.IncludingInitContainers)
params := strategy.Params.PodsHavingTooManyRestarts
if restarts <= params.PodeRestartThresholds {
continue
}
continue | ||
} | ||
|
||
glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aveshagarwal do we need to distinguish between ordinary and init containers? We could just report the total number of containers that got restarted. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thats fine, it's helpful to distinguish in the log message.
pod := test.BuildTestPod(fmt.Sprintf("pod-%d", i), 100, 0, node.Name) | ||
pod.ObjectMeta.OwnerReferences = test.GetNormalPodOwnerRefList() | ||
|
||
// pod i will has 25 * i restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/has/have
@liubin In general, it looks good to me but still need to review the code more thoroughly. One question I have that what is this strategy relying on that makes sure that evicted pods would get scheduled to some other nodes by the default scheduler but not on the same node? I am |
@aveshagarwal Good question. I have the sam question, and some others: restarts is a cumulative value, 1000 restarts in one year and 1000 restarts in 1 hour have different meanings. I will do more research and try to find the most practical answer. |
@ingvagabund I tried:
Is the CI has some different settings, I can find the failed reason. |
@liubin meanwhile could you split it in 2 commits during your next update: 1 commit for auto-generated files and 1 commit for main code changes for easier review. |
@aveshagarwal @ravisantoshgudimetla we should bump to go version to 1.10.*. |
$ go version
go version go1.10.2 linux/amd64
$ ./hack/verify-gofmt.sh
$ echo $?
0 |
/retest |
1 similar comment
/retest |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@liubin would you interested in picking this up again as I see there are some use-cases around it? |
/reopen |
@ravisantoshgudimetla: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@liubin: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: liubin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@damemi: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@liubin sorry to resurrect an old thread, but are you still interested in rebasing this PR? If not, we can still pick your commits into a new PR |
/kind feature |
@damemi: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ncy-openshift-4.14-atomic-openshift-descheduler Updating atomic-openshift-descheduler images to be consistent with ART
This PR will eviction pods having too many restarts, due to the host-originated problem ( network/storage/firewall ). Usually, these pods should be re-scheduled to other pods.