-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running podman under a root v2 cgroup #14308
Conversation
45d6068
to
e88861a
Compare
Signed-off-by: Jason T. Greene <jason.greene@redhat.com>
PTAL @giuseppe |
Schweet! You rock, @n1hility 🎸 So I'm guessing: Did the original implementation assume Podman was running as a child cgroup spawned by Or something like that? |
@giuseppe PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be enough if we skip the check when running from the root cgroup?
if own == "/" { | ||
// If running under the root cgroup try to create or reuse a "probe" cgroup to read memory values | ||
own = "podman_probe" | ||
_ = os.MkdirAll(filepath.Join("/sys/fs/cgroup", own), 0o755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it fine to leak the cgroup here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed reasonable to me to leak since it's one global value for the system, the memory overhead is tiny, and it avoids the race of a cleanup/recreate. IIRC we leak in other cases (libpod_parent), and I assume for similar reasons. Additionally in the case it is used, there is likely no systemd running which would have created many more times the single probe group used here. If it's a concern we could overload libpod_parent by pre-creating it, but keeping it isolated eliminates any future conflict.
At first that's what I thought we should do, but in looking into the history, #6365 seems to indicate non-swap accounting setups are still common So if the check is skipped a process running in the root cgroup could still fail. When it does fail it will be confusing as to why it failed, since its not immediately clear the process cgroup is relevant (since in truth it's not). Granted we could add some debug logging to point it out. The cost of creating an adhoc group seems small, so IMO worth the tradeoff of added robustness. It could fail, but if it does, the container cgroup creation is also likely to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, n1hility The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Handles the case where podman is launched under the root cgroup in cgroups v2.
Since the swap validation check requires memory controller limit files (not present on root), attempt to create a group to read from.
Fixes #14236