Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc/crun, cgroups and CRIU #1793

Open
adrianreber opened this issue Mar 28, 2022 · 3 comments
Open

runc/crun, cgroups and CRIU #1793

adrianreber opened this issue Mar 28, 2022 · 3 comments
Assignees

Comments

@adrianreber
Copy link
Member

I am currently looking at a problem concerning CRIU and OCI containers. My understanding so far is the following:

I am creating a checkpoint with manage_cgroups not set. This means we should have opts.manage_cgroups = CG_MODE_DEFAULT which is set to #define CG_MODE_DEFAULT (CG_MODE_SOFT).

Creating a checkpoint CRIU still tracks the information about the cgroup of the process in the container.

My understanding is that this should not be necessary, as (crun at least) will move the process after restore in the new cgroup created by crun. I think this is the only right approach. CRIU should, in case of OCI containers, not touch the cgroup setting. If the container is restored it will be restored with a newly created cgroup by the container runtime (crun/runc).

Setting #define CG_MODE_DEFAULT (CG_MODE_IGNORE) I still get a cgroup.img and core-1.img references cgroups via "cg_set": 2,.

The restore fails with:

(00.003375)      1: cg: Move into 2
(00.003391)      1: cg: setting cgns prefix to /machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container
(00.003415)      1: Error (criu/cgroup.c:1092): cg: Can't move 1 into unifie//machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container/cgroup.procs (-1/-1): Bad file descriptor
(00.003427)      1: Error (criu/cgroup.c:1148): cg: couldn't set cgns prefix unifie//machine.slice/libpod-dd47c09e12569883f67d88a5da89cbd2e1c450b2f3803087ee72e3a062a05186.scope/container/cgroup.procs: Bad file descriptor
(00.003431)      1: Error (criu/cgroup.c:1171): cg: failed preparing cgns

So there is still a bug somewhere in the code because unifie//machine.slice does not look correct.

Using CRIU's manage_cgroup mode will result in CG_MODE_SOFT and the restore works, but the restore does strange things. First of all I see in the logs:

(00.001357) cg: Preparing cgroups yard (cgroups restore mode 0x4)
(00.001593) cg: Opening .criu.cgyard.cifCa8 as cg yard
(00.001613) cg:         Making controller dir .criu.cgyard.cifCa8/unifie ()
(00.001707) cg: Determined cgroup dir unifie/machine.slice/libpod-30325b748276c463e9f5e8db0f98662915f7372f7585287dcae81c8cd4d75636.scope/container already exist
(00.001713) cg: Skip restoring properties on cgroup dir unifie/machine.slice/libpod-30325b748276c463e9f5e8db0f98662915f7372f7585287dcae81c8cd4d75636.scope/container

Which again looks wrong from the used paths and it is still referencing old cgroup paths although the container has another ID and the container runtime created another ID.

To reproduce:

podman run -d quay.io/adrianreber/counter
podman container checkpoint --latest --export /tmp/dump.tar -R -k
podman container restore -i /tmp/dump.tar -n new -k

Looking at the restore log of the container new will show the message from above. The log can be found with podman inspect -l --format "{{.State.RestoreLog}}".

So this is actually a bug report that the cgroup handling is not correct from CRIU and also a question if CRIU should just completely ignore the cgroup settings when used in combination with crun/runc, because crun/runc will create a new cgroup for a new container and move the processes into it. Currently it does not seem possible to tell CRIU to completely ignore the cgroup even with CG_MODE_IGNORE.

@mihalicyn @avagin any ideas, suggestions or comments?

@avagin
Copy link
Member

avagin commented Apr 2, 2022

@adrianreber have you look at runsc code? I think we have the --cgroup-root option and it has to be set to the container cgroup root.

@adrianreber
Copy link
Member Author

I have a possible fix in #1800 (works for me and Podman)

@github-actions
Copy link

github-actions bot commented May 5, 2022

A friendly reminder that this issue had no activity for 30 days.

kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 2, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 2, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Aug 4, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Sep 2, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Oct 27, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Nov 2, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this issue Dec 15, 2022
When manage-cgroups-mode: ignore is used, criu still needs to know the
cgroup path to work properly (see [1]).

Revert "libct/criuApplyCgroups: don't set cgroup paths for v2"

This reverts commit d5c57dc.

[1]: checkpoint-restore/criu#1793 (comment)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants