Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] Fix systemd cgroup driver's Apply (and make CI green again) #3806

Merged
merged 4 commits into from
Apr 5, 2023

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 3, 2023

This is a backport of #3782 for 1.1 branch. The difference is, here we still allow a container to share a cgroup with another container, but only when systemd cgroup driver is NOT used.

In case of systemd cgroup driver, which can't add a PID into an existing cgroup, we error out early.

The reasons behind this approach is described in #3782 (comment)

Fixes: #3780

Move error handling earlier, removing "if err == nil" block.

No change of logic.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit c6e8cb7)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit d223e2a ("Ignore error when starting transient unit
that already exists" modified the code handling errors from startUnit
to ignore UnitExists error.

Apparently it was done so that kubelet can create the same pod slice
over and over without hitting an error (see [1]).

While it works for a pod slice to ensure it exists, it is a gross bug
to ignore UnitExists when creating a container. In this case, the
container init PID won't be added to the systemd unit (and to the
required cgroup), and as a result the container will successfully
run in a current user cgroup, without any cgroup limits applied.

So, fix the code to only ignore UnitExists if we're not adding a process
to the systemd unit. This way, kubelet will keep working as is, but
runc will refuse to create containers which are not placed into a
requested cgroup.

[1] opencontainers#1124

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit c253342)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case a systemd unit fails (for example, timed out or OOM-killed),
systemd keeps the unit. This prevents starting a new container with
the same systemd unit name.

The fix is to call reset-failed in case UnitExists error is returned,
and retry once.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 1d18743)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As this is currently not possible to add a PID into an existing systemd
unit, plus this feature will be deprected in runc 1.2 (see commit
d08bc0c ("runc run: warn on non-empty cgroup"), let's reject
sharing a systemd unit between two containers, and fix the test case
accordingly.

We still allow this to happen in case cgroupfs driver is used, to
minimize the potential compatibility issues in a stable branch.

This is an adaptation of main branch commit 82bc89c for 1.1.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin added this to the 1.1.6 milestone Apr 3, 2023
@kolyshkin kolyshkin changed the base branch from main to release-1.1 April 3, 2023 22:04
@kolyshkin
Copy link
Contributor Author

close/reopen to re-kick CI

@kolyshkin
Copy link
Contributor Author

CI failure in CS9 is a flake being fixed by commit 4d0a60c which we need to backport.

CI restarted

@kolyshkin kolyshkin marked this pull request as ready for review April 4, 2023 00:54
@kolyshkin
Copy link
Contributor Author

This makes CI green again, yay 👯

@kolyshkin kolyshkin requested review from cyphar, thaJeztah, hqhq and AkihiroSuda and removed request for cyphar April 4, 2023 00:54
@kolyshkin kolyshkin changed the title [1.1] Fix systemd cgroup driver's Apply [1.1] Fix systemd cgroup driver's Apply (and make CI green again) Apr 4, 2023
@kolyshkin kolyshkin mentioned this pull request Apr 4, 2023
@kolyshkin
Copy link
Contributor Author

@AkihiroSuda @mrunalp PTAL

1 similar comment
@kolyshkin

This comment was marked as duplicate.

@kolyshkin kolyshkin enabled auto-merge April 5, 2023 02:00
@kolyshkin kolyshkin merged commit 55b2dbf into opencontainers:release-1.1 Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants