-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix systemd cgroup driver's Apply #3782
Conversation
The failure with "runc ps" on CentOS 9 is gone as expected. This failure:
is a separate issue not addressed by this PR. |
@@ -356,7 +356,7 @@ function setup() { | |||
[ "$output" = "ok" ] | |||
} | |||
|
|||
@test "runc run/create should warn about a non-empty cgroup" { | |||
@test "runc run/create should error for a non-empty cgroup" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change?
Probably fine for v1.2, but not sure backportable to v1.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To begin with, it never worked anyway, because if the systemd unit exists, the error is ignored and the new container is never added to the unit (and to the cgroup). To repeat, it never worked anyway.
We can try making it work in 1.1 though. The fix would be very different from this one, something like "if startUnit
returned UnitExists
error, call setUnitProperties
with properties of PIDs=[new pid]
".
I am not sure that this will work (maybe, maybe not -- it's complicated. I am also unsure if we want to go that route at all -- I mean trying to fix something that never worked anyway.
In this version (1.2.x), I think this is the way it should be done.
In 1.1, we can discuss it later (for the 1.1.6 I guess).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate -- let's concentrate on how can we fix it in main branch for now, and think about 1.1 backport later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this a bit -- I think we can still allow shared cgroup when fs cgroup driver is used, keeping the warning, and error out in case of systemd cgroup driver. This is sort of a breaking change, but since
- the functionality never worked correctly (UnitExists error was ignored, and container was not placed into the proper systemd unit and/or cgroup), and
- will be deprecated in runc 1.2,
- implementing such a feature (adding a container to an existing systemd unit) is not very easy,
it makes little sense in trying to do that.
In particular, this test can be changed to look for a warning in case of an fs cgroup driver, and error in case of systemd cgroup driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolyshkin That sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented as described above in #3806
Move error handling earlier, removing "if err == nil" block. No change of logic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit d223e2a ("Ignore error when starting transient unit that already exists" modified the code handling errors from startUnit to ignore UnitExists error. Apparently it was done so that kubelet can create the same pod slice over and over without hitting an error (see [1]). While it works for a pod slice to ensure it exists, it is a gross bug to ignore UnitExists when creating a container. In this case, the container init PID won't be added to the systemd unit (and to the required cgroup), and as a result the container will successfully run in a current user cgroup, without any cgroup limits applied. So, fix the code to only ignore UnitExists if we're not adding a process to the systemd unit. This way, kubelet will keep working as is, but runc will refuse to create containers which are not placed into a requested cgroup. [1] opencontainers#1124 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case a systemd unit fails (for example, timed out or OOM-killed), systemd keeps the unit. This prevents starting a new container with the same systemd unit name. The fix is to call reset-failed in case UnitExists error is returned, and retry once. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit d08bc0c ("runc run: warn on non-empty cgroup") introduced a warning when a container is started in a non-empty cgroup. Such configuration has lots of issues. In addition to that, such configuration is not possible at all when using the systemd cgroup driver. As planned, let's promote this warning to an error, and fix the test case accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
1.1 backport, more conservative: #3806. |
As done in this runc PR opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. If it already exists, we attempt to reset the failed unit and retry once. Otherwise, we fail. See containerd#279 for more context. Signed-off-by: Matt Merkes <matt.merkes@gmail.com>
As done in opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. This change makes the logic more robust like in runc: 1. Attempts to reset a failed unit if it already exists 2. Verifies via the unit status that it successfully starts 3. Waits longer for unit to start 4. Continues to ignore unit existing when pid is -1 to accommodate kubelet use case 5. Otherwise, returns an error if it already exists Signed-off-by: Matt Merkes <matt.merkes@gmail.com>
As done in opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. This change makes the logic more robust like in runc: 1. Attempts to reset a failed unit if it already exists 2. Verifies via the unit status that it successfully starts 3. Waits longer for unit to start 4. Continues to ignore unit existing when pid is -1 to accommodate kubelet use case 5. Otherwise, returns an error if it already exists Signed-off-by: Matt Merkes <matt.merkes@gmail.com>
As done in opencontainers/runc#3782, UnitExists errors are no longer ignored when starting units. This change makes the logic more robust like in runc: 1. Attempts to reset a failed unit if it already exists 2. Verifies via the unit status that it successfully starts 3. Waits longer for unit to start 4. Continues to ignore unit existing when pid is -1 to accommodate kubelet use case 5. Otherwise, returns an error if it already exists Signed-off-by: Matt Merkes <matt.merkes@gmail.com> Update .gitignore Signed-off-by: Matt <matt.merkes@gmail.com>
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
This fixes the wrong logic of systemd cgroup manager's Apply method. For more details, see commit messages and #3780.
Fixes: #3780
Fixes: #3760