Configure Graceful Node Shutdown and lengthen max inhibitor delay #1222

dghubble · 2022-08-28T17:35:32Z

Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible
Allow up to 30s for critical pods to gracefully shutdown
Allow up to 15s for regular pods to gracefully shutdown
Node will be marked as NotReady promptly, instead of having to wait for health checks
Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds
Raise the default max inhibitor time from 5s to 45s

Verify systemd inhibitor locks are present:

sudo systemd-inhibit --list
WHO     UID USER PID  COMM    WHAT     WHY                                        MODE
kubelet 0   root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay

Tail journal logs and then shutdown a node via systemctl reboot
or via the cloud console to watch container shutdown

Rel:

* Configure Kubelet Graceful Node Shutdown to detect system shutdown events and stop running containers gracefully when possible * Allow up to 30s for critical pods to gracefully shutdown * Allow up to 15s for regular pods to gracefully shutdown * Node will be marked as NotReady promptly, instead of having to wait for health checks * Kubelet uses systemd inhibitor locks to delay shutdown for a limited number of seconds * Raise the default max inhibitor time from 5s to 45s Verify systemd inhibitor locks are present: ``` sudo systemd-inhibit --list WHO UID USER PID COMM WHAT WHY MODE kubelet 0 root 4581 kubelet shutdown Kubelet needs time to handle node shutdown delay ``` Tail journal logs and then shutdown a node via systemctl reboot or via the cloud console to watch container shutdown Rel: * https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ * https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ * kubernetes/kubernetes#107043 * coreos/fedora-coreos-tracker#821 * https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html * https://github.com/kubernetes/kubernetes/blob/release-1.24/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go * https://github.com/godbus/dbus/blob/master/conn.go

* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 #1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: kubernetes/kubernetes#110755

* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 poseidon/typhoon#1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: kubernetes/kubernetes#110755

* Disable Kubelet Graceful Node Shutdown on worker nodes (enabled in Kubernetes v1.25.0 poseidon#1222) * Graceful node shutdown shutdown allows 30s for critical pods to shutdown and 15s for regular pods to shutdown before releasing the inhibitor lock to allow the host to shutdown * Unfortunately, both pods and the node are shutdown at the same time at the end of the 45s period without further configuration options. As a result, regular pods and the node are shutdown at the same time. In practice, enabling this feature leaves Error or Completed pods in kube-apiserver state until manually cleaned up. This feature is not ready for general use * Fix issue where Error/Completed pods are accumulating whenever any node restarts (or auto-updates), visible in kubectl get pods * This issue wasn't apparent in initial testing and seems to only affect non-critical pods (due to critical pods being killed earlier) But its very apparent on our real clusters Rel: kubernetes/kubernetes#110755

dghubble force-pushed the graceful-shutdown branch from 2796ee8 to 393a38d Compare August 28, 2022 17:37

dghubble merged commit 393a38d into main Aug 28, 2022

dghubble deleted the graceful-shutdown branch August 28, 2022 17:43

dghubble mentioned this pull request Sep 10, 2022

Revert Graceful Node Shutdown feature #1227

Merged

dghubble mentioned this pull request Nov 3, 2022

Re-enable Graceful Node Shutdown feature #1261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure Graceful Node Shutdown and lengthen max inhibitor delay #1222

Configure Graceful Node Shutdown and lengthen max inhibitor delay #1222

dghubble commented Aug 28, 2022

Configure Graceful Node Shutdown and lengthen max inhibitor delay #1222

Configure Graceful Node Shutdown and lengthen max inhibitor delay #1222

Conversation

dghubble commented Aug 28, 2022