Fix systemd cgroup driver's Apply #3782

kolyshkin · 2023-03-23T18:22:32Z

This fixes the wrong logic of systemd cgroup manager's Apply method. For more details, see commit messages and #3780.

Fixes: #3780
Fixes: #3760

kolyshkin · 2023-03-23T20:41:25Z

The failure with "runc ps" on CentOS 9 is gone as expected.

This failure:

not ok 168 update cgroup cpu.idle

is a separate issue not addressed by this PR.

AkihiroSuda · 2023-03-28T15:40:13Z

tests/integration/cgroups.bats

@@ -356,7 +356,7 @@ function setup() {
 	[ "$output" = "ok" ]
 }

-@test "runc run/create should warn about a non-empty cgroup" {
+@test "runc run/create should error for a non-empty cgroup" {


Is this a breaking change?
Probably fine for v1.2, but not sure backportable to v1.1.

To begin with, it never worked anyway, because if the systemd unit exists, the error is ignored and the new container is never added to the unit (and to the cgroup). To repeat, it never worked anyway.

We can try making it work in 1.1 though. The fix would be very different from this one, something like "if startUnit returned UnitExists error, call setUnitProperties with properties of PIDs=[new pid]".

I am not sure that this will work (maybe, maybe not -- it's complicated. I am also unsure if we want to go that route at all -- I mean trying to fix something that never worked anyway.

In this version (1.2.x), I think this is the way it should be done.

In 1.1, we can discuss it later (for the 1.1.6 I guess).

To reiterate -- let's concentrate on how can we fix it in main branch for now, and think about 1.1 backport later.

Thought about this a bit -- I think we can still allow shared cgroup when fs cgroup driver is used, keeping the warning, and error out in case of systemd cgroup driver. This is sort of a breaking change, but since

the functionality never worked correctly (UnitExists error was ignored, and container was not placed into the proper systemd unit and/or cgroup), and

will be deprecated in runc 1.2,

implementing such a feature (adding a container to an existing systemd unit) is not very easy,

it makes little sense in trying to do that.

In particular, this test can be changed to look for a warning in case of an fs cgroup driver, and error in case of systemd cgroup driver.

@kolyshkin That sounds good!

Implemented as described above in #3806

Move error handling earlier, removing "if err == nil" block. No change of logic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Commit d223e2a ("Ignore error when starting transient unit that already exists" modified the code handling errors from startUnit to ignore UnitExists error. Apparently it was done so that kubelet can create the same pod slice over and over without hitting an error (see [1]). While it works for a pod slice to ensure it exists, it is a gross bug to ignore UnitExists when creating a container. In this case, the container init PID won't be added to the systemd unit (and to the required cgroup), and as a result the container will successfully run in a current user cgroup, without any cgroup limits applied. So, fix the code to only ignore UnitExists if we're not adding a process to the systemd unit. This way, kubelet will keep working as is, but runc will refuse to create containers which are not placed into a requested cgroup. [1] opencontainers#1124 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

In case a systemd unit fails (for example, timed out or OOM-killed), systemd keeps the unit. This prevents starting a new container with the same systemd unit name. The fix is to call reset-failed in case UnitExists error is returned, and retry once. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Commit d08bc0c ("runc run: warn on non-empty cgroup") introduced a warning when a container is started in a non-empty cgroup. Such configuration has lots of issues. In addition to that, such configuration is not possible at all when using the systemd cgroup driver. As planned, let's promote this warning to an error, and fix the test case accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2023-03-31T03:02:25Z

Rebased. The only CI failure to expect here is centos-stream-9 one from #3786 (not ok NN update cgroup cpu.idle), being fixed/mitigated by #3788.

kolyshkin · 2023-04-03T22:09:55Z

1.1 backport, more conservative: #3806.

As done in this runc PR opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. If it already exists, we attempt to reset the failed unit and retry once. Otherwise, we fail. See containerd#279 for more context. Signed-off-by: Matt Merkes <matt.merkes@gmail.com>

As done in opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. This change makes the logic more robust like in runc: 1. Attempts to reset a failed unit if it already exists 2. Verifies via the unit status that it successfully starts 3. Waits longer for unit to start 4. Continues to ignore unit existing when pid is -1 to accommodate kubelet use case 5. Otherwise, returns an error if it already exists Signed-off-by: Matt Merkes <matt.merkes@gmail.com>

As done in opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. This change makes the logic more robust like in runc: 1. Attempts to reset a failed unit if it already exists 2. Verifies via the unit status that it successfully starts 3. Waits longer for unit to start 4. Continues to ignore unit existing when pid is -1 to accommodate kubelet use case 5. Otherwise, returns an error if it already exists Signed-off-by: Matt Merkes <matt.merkes@gmail.com> Update .gitignore Signed-off-by: Matt <matt.merkes@gmail.com>

Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>

kolyshkin requested a review from a team March 23, 2023 19:16

kolyshkin added the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label Mar 23, 2023

kolyshkin marked this pull request as ready for review March 23, 2023 20:43

This was referenced Mar 27, 2023

libct/cg/dev: skip flaky test of CentOS 7 #3778

Merged

ci/cirrus: fix failures on Centos 9 #3762

Closed

centos-stream-9 CI is failing for the main branch #3760

Closed

AkihiroSuda reviewed Mar 28, 2023

View reviewed changes

kolyshkin added this to the 1.1.6 milestone Mar 28, 2023

kolyshkin requested a review from AkihiroSuda March 28, 2023 17:50

AkihiroSuda previously approved these changes Mar 28, 2023

View reviewed changes

kolyshkin requested a review from cyphar March 30, 2023 21:52

kolyshkin mentioned this pull request Mar 31, 2023

[1.1] CHANGELOG: fixes for 1.1.5 #3796

Merged

kolyshkin added 4 commits March 30, 2023 19:55

libct/cg/sd: refactor startUnit

c6e8cb7

Move error handling earlier, removing "if err == nil" block. No change of logic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the fix-sd-start branch from 0213ad7 to 82bc89c Compare March 31, 2023 02:55

This comment was marked as outdated.

Sign in to view

kolyshkin requested a review from hqhq March 31, 2023 16:52

kolyshkin dismissed AkihiroSuda’s stale review via 82bc89c March 31, 2023 20:01

AkihiroSuda approved these changes Apr 3, 2023

View reviewed changes

mrunalp approved these changes Apr 3, 2023

View reviewed changes

kolyshkin added the impact/changelog label Apr 3, 2023

kolyshkin modified the milestones: 1.1.6, 1.2.0 Apr 3, 2023

kolyshkin merged commit 9f24513 into opencontainers:main Apr 3, 2023

This was referenced Apr 3, 2023

runc systemd cgroup driver logic is wrong #3780

Closed

NewSystemd must not ignore UnitExists containerd/cgroups#279

Closed

kolyshkin mentioned this pull request Apr 3, 2023

[1.1] Fix systemd cgroup driver's Apply (and make CI green again) #3806

Merged

kolyshkin added backport/1.1-done A PR in main branch which has been backported to release-1.1 and removed backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels Apr 3, 2023

kolyshkin mentioned this pull request Apr 4, 2023

ci: reset failed systemd unit on OOM test #3802

Closed

kolyshkin mentioned this pull request Apr 13, 2023

update runc binary to v1.1.6 containerd/containerd#8384

Merged

mmerkes mentioned this pull request Apr 23, 2023

NewSystemd handles UnitExists when starting units containerd/cgroups#290

Merged

kolyshkin mentioned this pull request May 15, 2023

v1.1.6 regression (rootless, cgroup v2): container's cgroup is not empty: 5 process(es) found #3828

Closed

kolyshkin mentioned this pull request Nov 5, 2023

Fix runc kill and runc delete for containers with no init and no private PID namespace #4102

Merged

jchorl mentioned this pull request Sep 18, 2024

dont ignore failure to create cgroup after timeout containerd/cgroups#349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix systemd cgroup driver's Apply #3782

Fix systemd cgroup driver's Apply #3782

kolyshkin commented Mar 23, 2023

kolyshkin commented Mar 23, 2023

AkihiroSuda Mar 28, 2023 •

edited

Loading

kolyshkin Mar 28, 2023

kolyshkin Mar 28, 2023

kolyshkin Mar 29, 2023

mrunalp Mar 31, 2023

kolyshkin Apr 3, 2023

This comment was marked as outdated.

kolyshkin commented Mar 31, 2023

kolyshkin commented Apr 3, 2023

Fix systemd cgroup driver's Apply #3782

Fix systemd cgroup driver's Apply #3782

Conversation

kolyshkin commented Mar 23, 2023

kolyshkin commented Mar 23, 2023

AkihiroSuda Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

kolyshkin Mar 28, 2023

Choose a reason for hiding this comment

kolyshkin Mar 28, 2023

Choose a reason for hiding this comment

kolyshkin Mar 29, 2023

Choose a reason for hiding this comment

mrunalp Mar 31, 2023

Choose a reason for hiding this comment

kolyshkin Apr 3, 2023

Choose a reason for hiding this comment

This comment was marked as outdated.

kolyshkin commented Mar 31, 2023

kolyshkin commented Apr 3, 2023

AkihiroSuda Mar 28, 2023 •

edited

Loading