-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recommend draining the node before updating kubelet #43053
recommend draining the node before updating kubelet #43053
Conversation
/cc @micahhausler |
/assign @derekwaynecarr @SergeyKanzhelev @mrunalp |
/cc @liggitt |
@@ -178,8 +178,7 @@ Pre-requisites: | |||
Optionally upgrade `kubelet` instances to **{{< skew currentVersion >}}** (or they can be left at **{{< skew currentVersionAddMinor -1 >}}**, **{{< skew currentVersionAddMinor -2 >}}**, or **{{< skew currentVersionAddMinor -3 >}}**) | |||
|
|||
{{< note >}} | |||
Before performing a minor version `kubelet` upgrade, [drain](/docs/tasks/administer-cluster/safely-drain-node/) pods from that node. | |||
In-place minor version `kubelet` upgrades are not supported. | |||
Before performing any `kubelet` upgrade, [drain](/docs/tasks/administer-cluster/safely-drain-node/) pods from that node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should say this here or require this. There are two aspects:
- kubelet compatibility
- effectively fixing bugs
This document is focused on kubelet compatibility. Kubelet should work properly when restarted against state persisted by another patch version of the kubelet.
Fixing some bugs require restarting containers. That seems like a useful thing to note in releases where a bugfix requiring a container restart is released, but I would not impose that requirement on all patch updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that, for this example, there are theoretically in the wild pods that are running with no seccomp enforcement but with a known "good" kubelet version even though users followed the current documentation which doesn't suggest that a node drain is necessary.
That seems like a useful thing to note in releases where a bugfix requiring a container restart is released, but I would not impose that requirement on all patch updates.
I think a careful evaluation of the question of "Will running containers require a restart?" for every backported fix is possible, but its a much larger burden than recommending draining the node in all cases. For example, it was missed in this case.
Users can still not drain the node if they choose, but its a risk assessment they make at that point.
How do you feel about suggesting draining the node, something like:
- Before performing a minor version `kubelet` upgrade, [drain](/docs/tasks/administer-cluster/safely-drain-node/) pods from that node.
- In-place minor version `kubelet` upgrades are not supported.
- For in-place patch `kubelet` upgrades, draining the node is the safest approach but not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this document focused on the kubelet compatibility requirements, and make it unambiguous that kubelet is compatible with runtime state persisted by other patch versions within the same minor.
@tzneal what are you plans for this PR, given the feedback so far? |
I'm going to find a different place in the docs to put this wording. I don't think that the danger of in-place updates is expressed anywhere currently in the documentation which leaves users vulnerable to CVEs they think they've mitigated. |
2064dfa
to
2282d06
Compare
Added a warning in the upgrade guide that draining the node may be necessary to resolve CVEs or bugs. |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
You could also propose a change to https://kubernetes.io/docs/concepts/security/security-checklist/ (maybe another PR, I'm not sure what works best).
I'm thinking of adding an item to say [words to the effect] that there are no nodes where:
- some Pod x is running
- the version of the kubelet that started Pod x is more than one minor version less than the cluster's version of Kubernetes
Hard to express though, isn't it? I initially assumed that systemctl stop kubelet
would shut down workloads but that's not the case, and this is moderately surprising. So it's worth documenting a bit more IMO.
LGTM label has been added. Git tree hash: d44750628ce2b4b7ec89104a489485252a3d02a4
|
This change clarifies that the node should be drained even when performing patch upgrades of kubelet.
2282d06
to
20cfb80
Compare
Agree, regarding documenting more. is there a good way to determine which version of kubelet started the pod? For the security page, maybe a new section on upgrading? The page I modified was what was linked in the CVE, so I figured it was the best place to start. |
/lgtm |
LGTM label has been added. Git tree hash: 994a79b40398adcd1216fe62de0a94ca665114a4
|
I don't think it's generally possible. The kubelet may have restarted since, and the other end of the CRI socket is a black box as far as Kubernetes is concerned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sftim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes docs state that minor upgrades of kubelet require draining the node:
It’s also been stated, but not in documentation that patch upgrades don’t require draining the node:
My contention is that currently in-place patch upgrades of kubelet are unsafe and we should clarify the K8s documentation to explicitly state that you must drain the node in all circumstances when updating kubelet.
Why?
I suspect there are other examples of when in-place kubelet upgrades fail, but the most recent one I've found is a failure to resolve a CVE for running pods. A recent bug in kubelet allowed pods to bypass seccomp enforcement. A CVE (CVE-2023-2431) was published and the announcement was sent to the community. A fix for this CVE was backported into the 1.27.2 patch release of kubelet.
Without this docs change, users may assume that in-place updating of kubelet will resolve this CVE. It will, but only for newly admitted pods. An in-place update will not terminate any running pods allowing them to continue to bypass seccomp enforcement.
To demonstrate the failure, I launched a pod with an empty string
localhostProfile
on a node with kubelet 1.27.0. I then stopped kubelet and replaced its binary with the 1.27.6 patch version of kubelet before restarting it. The pod is still running on the node with no seccomp enforcement, but any vulnerability scans which look at the kubelet version will indicate that it’s patched and not susceptible to the CVE.