Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a default systemReserved configuration to the kubelet-config.json #1490

Open
reegnz opened this issue Oct 26, 2023 · 0 comments · May be fixed by #1808
Open

Add a default systemReserved configuration to the kubelet-config.json #1490

reegnz opened this issue Oct 26, 2023 · 0 comments · May be fixed by #1808

Comments

@reegnz
Copy link
Contributor

reegnz commented Oct 26, 2023

What would you like to be added:

The default kubelet should configure dedicated systemReserved cpu and memory.

Something like the following should be added to the kubelet-config.json:

"systemReserved": {
  "cpu": "50m",
  "memory": "128Mi"
}

Why is this needed:

We are experiencing nodes going from Ready to NotReady when the node has high memory pressure caused by pods with unset memory limits.
Expectation is that the kubelet kills the pods if they over-allocate, or if other pods arrive with requests and the over-committed pods should then get evicted by the kubelet.
Instead in these high memory pressure cases the entire node seems to die when using default EKS AMI configuration. Kubelet doesn't report back to the API server, and we also cannot connect to the nodes with SSM or Instance Connect.

Adding systemReserved configuration through --kubelet-extra-args might be an acceptable workaround, but it seems like something that should be configured by default on the nodes, so that even if the kubelet becomes unresponsive, services in the system.slice keep working so one can go and troubleshoot the node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant