Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable systemd cgroups management #540

Merged
merged 1 commit into from
Feb 9, 2022
Merged

Conversation

dtrudg
Copy link
Member

@dtrudg dtrudg commented Feb 1, 2022

Enable cgroups management via systemd.

--apply-cgroups for shell/exec/run etc. will create a singularity-<pid>.scope in the system.slice.

The OCI runtime will create a singularity-oci-<name>.scope in the system.slice.

This is a stepping stone toward allowing unpriv cgroups limits, via asking systemd to create and delegate scope in the user slice, on cgroups v2 systems.

Set systemd cgroups management as the default. It will be needed as we move toward rootless, and we are not directly supporting any non-systemd distributions at this time. It can be disabled, to fall back to direct manipulation of cgroupfs, by setting systemd cgroups = no in singularity.conf.

NOTE...

There is reasonable unit test coverage of the cgroups functionality with both cgroupfs and systemd management.

There is currently NO e2e coverage of systemd cgroups management. Our e2e tests are run in a new PID namespace. We can't ask the host systemd to manage cgroups for PIDs in this namespace, as they are completely separate from the host PIDs that system sees.

I am going to capture a tech debt issue for the e2e to address separately, as it won't be a trivial change (#563).

Fixes #299

@dtrudg dtrudg added the enhancement New feature or request label Feb 1, 2022
@dtrudg dtrudg added this to the SingularityCE 3.10 milestone Feb 1, 2022
@dtrudg dtrudg self-assigned this Feb 1, 2022
@dtrudg dtrudg marked this pull request as draft February 1, 2022 20:23
@dtrudg dtrudg force-pushed the issue299 branch 9 times, most recently from 06bca64 to e963973 Compare February 7, 2022 20:19
@dtrudg dtrudg changed the title WIP: feat: Enable systemd cgroups management feat: Enable systemd cgroups management Feb 7, 2022
@dtrudg dtrudg force-pushed the issue299 branch 2 times, most recently from 553b660 to af15171 Compare February 7, 2022 21:29
@dtrudg dtrudg marked this pull request as ready for review February 7, 2022 21:57
Copy link
Member

@tri-adam tri-adam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment.

internal/pkg/cgroups/util.go Outdated Show resolved Hide resolved
@dtrudg dtrudg merged commit 145b7ed into sylabs:master Feb 9, 2022
@dtrudg dtrudg deleted the issue299 branch February 9, 2022 15:37
dtrudg added a commit to dtrudg/singularity that referenced this pull request Mar 4, 2022
The Singularity e2e-tests were previously all run in a mount and PID
namespace, to avoid polluting the host filesystem (critical), and
process tree (less critical). These namespaces are set up in some CGO
init code.

The use of the PID namespace prevents testing with systemd as the
cgroups manager, as systemd is aware of the host process tree, not the
one in the new e2e PID namespace. This is a major omission since sylabs#540
switched to using systemd for cgroups management by default.

This PR:

 * Modifies the e2e init CGO code, so that an env var
   `SINGULARITYE_E2E_NO_PID_NS` will cause it *not* to create a new PID
   namespace.

 * Modifies the Makefile so that the e2e suite is called twice, once
   with PID ns, once without.

 * Moves the INSTANCE cgroups test into a new package/topic
   e2e/cgroups, run in the e2e call without PID ns.

 * Moves the OCI tests into the e2e call without PID ns.

 * Modifies the CGROUPS and OCI tests so that they test with both
   systemd and cgroupfs management, using a convenience wrapper function.

Fixes: 563

Note - this breaks the e2e CLI coverage, as it only supports
collecting / analyzing one e2e run... where we now have two. The CLI
coverage metrics are unused in practice, and flawed (don't consider
combinations of flags required or possible), so I'm going to propose
removing them in a separate issue.
dtrudg added a commit to dtrudg/singularity that referenced this pull request Mar 4, 2022
The Singularity e2e-tests were previously all run in a mount and PID
namespace, to avoid polluting the host filesystem (critical), and
process tree (less critical). These namespaces are set up in some CGO
init code.

The use of the PID namespace prevents testing with systemd as the
cgroups manager, as systemd is aware of the host process tree, not the
one in the new e2e PID namespace. This is a major omission since sylabs#540
switched to using systemd for cgroups management by default.

This PR:

 * Modifies the e2e init CGO code, so that an env var
   `SINGULARITYE_E2E_NO_PID_NS` will cause it *not* to create a new PID
   namespace.

 * Modifies the Makefile so that the e2e suite is called twice, once
   with PID ns, once without.

 * Moves the INSTANCE cgroups test into a new package/topic
   e2e/cgroups, run in the e2e call without PID ns.

 * Moves the OCI tests into the e2e call without PID ns.

 * Modifies the CGROUPS and OCI tests so that they test with both
   systemd and cgroupfs management, using a convenience wrapper function.

Fixes: 563

Note - this breaks the e2e CLI coverage, as it only supports
collecting / analyzing one e2e run... where we now have two. The CLI
coverage metrics are unused in practice, and flawed (don't consider
combinations of flags required or possible), so I'm going to propose
removing them in a separate issue.
dtrudg added a commit to dtrudg/singularity that referenced this pull request Mar 4, 2022
The Singularity e2e-tests were previously all run in a mount and PID
namespace, to avoid polluting the host filesystem (critical), and
process tree (less critical). These namespaces are set up in some CGO
init code.

The use of the PID namespace prevents testing with systemd as the
cgroups manager, as systemd is aware of the host process tree, not the
one in the new e2e PID namespace. This is a major omission since sylabs#540
switched to using systemd for cgroups management by default.

This PR:

 * Modifies the e2e init CGO code, so that an env var
   `SINGULARITYE_E2E_NO_PID_NS` will cause it *not* to create a new PID
   namespace.

 * Modifies the Makefile so that the e2e suite is called twice, once
   with PID ns, once without.

 * Moves the INSTANCE cgroups test into a new package/topic
   e2e/cgroups, run in the e2e call without PID ns.

 * Moves the OCI tests into the e2e call without PID ns.

 * Modifies the CGROUPS and OCI tests so that they test with both
   systemd and cgroupfs management, using a convenience wrapper function.

Fixes: 563

Note - this breaks the e2e CLI coverage, as it only supports
collecting / analyzing one e2e run... where we now have two. The CLI
coverage metrics are unused in practice, and flawed (don't consider
combinations of flags required or possible), so I'm going to propose
removing them in a separate issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Systemd as cgroups manager
2 participants