-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not evict pods which tolerate all NoExecute taints #93722
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
klog.V(4).Infof("New tolerations for %v tolerate forever. Scheduled deletion won't be cancelled if already scheduled.", podNamespacedName.String()) | ||
klog.V(4).Infof("Current tolerations for %v tolerate forever, cancelling any scheduled deletion.", podNamespacedName.String()) | ||
tc.cancelWorkWithEvent(podNamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like special attention on this change... the logged message explicitly called out the problematic behavior, so it seems like it was known, but it still seems incorrect to me.
If a pod infinitely tolerates all remaining taints on the node, I don't see why that should have different behavior than a node with no taints (for which we call cancelWorkWithEvent on line 348)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original commit in https://github.com/kubernetes/kubernetes/pull/40355/files#diff-12d91ddb84ebd95c1ccaa6b5980108f6R337-R341, comments by @davidopp at https://github.com/kubernetes/kubernetes/pull/40355/files#r100682445
eviction that has not yet started gets moved later or canceled (e.g. eviction is added to the eviction queue, then pod is updated with larger tolerationSeconds for its soonest toleration, or all taints are removed from node) -- IIUC we do want to move the eviction later (or cancel it if removing all taints) in that case?
I think "removing all taints" behavior should be equivalent to "removing all taints except ones that are tolerated infinitely"
/retest |
/lgtm |
/retest |
/cc @karan |
/cc @dashpole |
I've been trying to think of why the previous behavior would be desirable, but can't come up with anything. +1 to being consistent. |
My inclination would be to include this in 1.19 and pick it back to supported releases, because:
I would like feedback from https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodelifecycle/OWNERS (though I'm not sure that full list is up to date) |
maybe @k82cn has thoughts? |
@ravisantoshgudimetla you worked on the taint-based eviction controller back in the day right (like... a year ago)? You see anything wrong with this? lgtm though |
/test pull-kubernetes-e2e-kind-ipv6 |
1 similar comment
/test pull-kubernetes-e2e-kind-ipv6 |
Opened picks to 1.16-1.18 to start exercising CI. |
Just went through the diff, LGTM overall, thanks very much 👍 Also agree to cherry-pick this PR :) |
Sounds good, thanks for reviewing /milestone v1.19 |
…2-upstream-release-1.16 Automated cherry pick of #93722: Do not evict pods which tolerate all NoExecute taints
…2-upstream-release-1.18 Automated cherry pick of #93722: Do not evict pods which tolerate all NoExecute taints
…2-upstream-release-1.17 Automated cherry pick of #93722: Do not evict pods which tolerate all NoExecute taints
What type of PR is this?
/kind bug
Which issue(s) this PR fixes:
Fixes #90794
Does this PR introduce a user-facing change?:
/sig node
/priority important-soon