-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.1.6 regression: adding misc controller to cgroup v1 makes kubelet sad #3849
Comments
Could this be rather a regression on the kernel side? |
Conceivably, although:
I am happy to help provide additional logs or dig deeper, since I can reproduce this on-demand easily. What's puzzling to me is why this apparently fixed the Kubernetes issue for some users, but triggers it reliably here where we never saw the problem before. It's possible it comes down to kernel config differences or systemd versions or something like that. |
I suspect that runc 1.1.6 binary creates misc cgroup, and then kubelet uses runc's libcontainer (of an older version) to remove it. That older libcontainer version doesn't know about misc (and systemd doesn't know about it either), so it's not removed. Thus, bumping the runc/libcontainer dependency to 1.1.6 should fix this. |
@dims pointed me at #3028 which was helpful background for me. Let me try to step through the timeline as I understand it:
One way to frame it is:
Another way might be:
I don't intend to criticize, I'm just trying to understand the path forward. If the only safe path is to update runc and kubelet's runc dependency in lockstep, I can work with that. If they're expected to be independent, I can file bug reports if it ever turns out that they're not. Right now I'm not sure. (Would it have been possible to add awareness of the "misc" controller to the cgroup library, so kubelet could handle its existence, without also changing the runc binary's default behavior to join that controller?) |
@bcressey right now the sequence of events for the record are:
Also once a release of k8s is made, don't go updating the vendor-ed dependency usually (this current snafu could be an exception), but we may update the binary we test. So currently what we tell folks is that a distro should probably follow the lead from k8s (and not jump ahead of the signals we get from the dozens of CI jobs across containerd and k8s). |
@dims how does this work in a situation where the runc update also contains security fixes? That's not the case with runc 1.1.6, but it did fix at least one concerning bug around adding containers to the proper cgroup, which is why we pulled it in to Bottlerocket. So this is not that situation, exactly, but it's very much at the top of my mind, and not just a theoretical exercise. |
To clarify:
In other words, there is no need for runc binary and runc/libcontainer to be totally in sync. However, if runc binary knows about misc but k8s don't, it's a problem (provided we have cgroup v1 and a sufficiently new kernel). One way to fix this would be for runc/libcontainer's Destroy method to remove all controllers, known and unknown. The problem here is cgroup v1 hierarchy is a forest, not a tree, i.e. we have a bunch of paths like There may be a way to fix this issue. Let me see... |
Such a way will still require updating runc/libcontainer in kubernetes, so it will not help the current problem, only the future ones that are similar. I am trying to justify if we should try to fix it or not. Arguments for:
Arguments against:
|
I spent more than half a day last week working on this, and it's not very easy. So, let's hope the kernel will not add more controllers any time soon. In the meantime, here are the fixes for kubernetes:
|
🤞🏾 🤞🏾 🤞🏾 🤞🏾 🤞🏾 🤞🏾 🤞🏾 |
@kolyshkin To help others bumping into this issue, would be helpful to update the release notes for v1.1.6 to make it clear that it may / is a breaking change. Having "cgroup v1 drivers are now aware of misc controller" as a quick mention mention doesn't fully express the "sadness" it brings. 🥲 |
@hakman thanks, I've added the "Known issues" section into https://github.com/opencontainers/runc/releases/tag/v1.1.6 |
I believe this issue is addressed (as much as we can), so closing. |
Description
The most recent Bottlerocket release included an update to runc 1.1.6. Shortly after the release, we received reports of a regression where nodes would fall over after kubelet, systemd, and dbus-broker consumed excessive CPU and memory resources.
In bottlerocket-os/bottlerocket#3057 I narrowed this down via
git bisect
to e4ce94e which was meant to fix this issue, but instead now causes it to happen consistently.I've confirmed that reverting that specific patch fixes the regression.
Steps to reproduce the issue
On an EKS 1.26 cluster with a single worker node, apply a consistent load via this spec:
(repro credit to @yeazelm)
After a short time, the "Path does not exist" and "Failed to delete cgroup paths" errors appear and continue even after the spec is deleted and the load is removed.
Describe the results you received and expected
systemd
,kubelet
, anddbus-broker
all showed high CPU usage.journalctl -f
andbusctl monitor
showed these messages repeatedly:What version of runc are you using?
Host OS information
Bottlerocket
Host kernel information
Bottlerocket releases cover a variety of kernels, so to break it down a bit:
The text was updated successfully, but these errors were encountered: