Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Cgroup Parent and Name fields by CgroupsPath #497

Merged
merged 2 commits into from
Feb 10, 2016

Conversation

mlaventure
Copy link
Contributor

Fixes #396

Closes #397
Closes #399

Signed-off-by: Kenfe-Mickael Laventure mickael.laventure@gmail.com


Since #397 and #399 haven't been touched in a month, I've made this PR.

This hopefully includes all decisions following the discussions in the aforementioned PRs.

However, since #466 was merged, we may need a better name than CgroupsPath now that we have a "Paths" member, but I couldn't come up with a better name. Open to suggestions.

// path resulting from prepending another path will always resolve to lexically
// be a subdirectory of the prefixed path. This is all done lexically, so paths
// that include symlinks won't be safe as a result of using pathClean.
func pathClean(path string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK on removing this. It fixes a security vulnerability. Please use this function to clean paths that need to be scoped to a mountpoint, not filepath.Clean.

EDIT: Ah, you've just moved it. Still NACK, moving it means that Docker will have a security regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noted that, yes.

What about making it a public method within the libcontainer/utils package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. There are probably some other places where we aren't cleaning paths properly.

@mlaventure mlaventure force-pushed the cgroups-path branch 2 times, most recently from bb9396d to b4e1930 Compare January 22, 2016 01:23
@hqhq
Copy link
Contributor

hqhq commented Jan 22, 2016

I think we are waiting for opencontainers/runtime-spec#270
It's gonna be a bit tough for systemd cgroup, do you have any idea how we should handle cgroupName for systemd cgroup? Just split the path seems not enough.

@mlaventure
Copy link
Contributor Author

Ah, I missed opencontainers/runtime-spec#270 somehow, sorry.

For systemd, I tried to get the same result as before. I'll double check on a vm tomorrow in case I missed something (I basically runc a busybox with cat /proc/self/cgroup and got the expected result with my solution).

Do we have a systemd expert in the contributors?

config: c,
pid: pid,
root: root,
cgroupspath: libcontainerUtils.CleanPath(c.CgroupsPath),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why isn't this just called path? After all, the struct is called cgroupData.

@cyphar
Copy link
Member

cyphar commented Jan 22, 2016

@mlaventure I'm not a systemd expert (@LK4D4 is the closest we have AFAIK), but I've dealt with my fair share of systemd bugs recently. TransientUnits require a slice and a unit name. They are then concatenated. AFAICS filepath.Split is fine as long as the path has been Cleand properly.

@hqhq
Copy link
Contributor

hqhq commented Jan 22, 2016

It's more like a strategy problem, something like, if we demand cgroupsPath to be absolute (as specs required), if user use cgroupsPath=/a.slice/b.slice/mycgroup.scope, filepath.Split won't work because /a.slice/b.slice is not a valid slice name, and what if some subsystems don't have the path /a.slice/b.slice should we fail or create this path? And how should systemd cgroupsPath work with --cgroup-parent on Docker side?
We need to think through these questions before process.

@cyphar
Copy link
Member

cyphar commented Jan 22, 2016

@hqhq Well, the slice names you've given aren't valid. It would be more like a.slice and a-b.slice (systemd scopes slices to /, unless you do something like add a - in your slice name, in which case the path is basically created using the - hierarchy). The "must be absolute" rules are because of systemd interactions with cgroupfs (something I know about quite viscerally).

We'd need to do some parsing on our side... Unless I'm mistaken and systemd understands paths as slice names.

@mlaventure
Copy link
Contributor Author

On docker side, we can just concatenate Parent and Name and use the result to fill up cgroupsPat h.

For systemd, I just realize that runc always uses fs regardless of whether systemd is installed on the localhost or not. We only do the detection when doing the test. I'll have to redo my testing.

@mlaventure
Copy link
Contributor Author

So, even master does not currently play nice with systemd (at least in my ubuntu 15.10 vm). It fails with the following

FATA[0000] Container start failed: [9] System error: Invalid slice name /user.slice/user-1000.slice/session-c1.scope

createCgroupConfig will use the path as read from within /proc/self/group which result in an invalid path name format as far as systemd is concerned (e.g. /user.slice/user-1000.slice/session-c1.scope which should be AFAIC something like user-1000.slice). The scope itself is another issue, I guess it's what we would consider to be Name currently. But systemd as the concept of ScopePrefix (session in the previous example) that we never fill. I guess we should probably take the default from /proc/self/cgroup (that would be session if we stick to my example).

Since this is 100% systemd specific, GetThisCgroupDir should probably moved within the Cgroup Managers.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 22, 2016

For systemd, we create the scope using the dbus APIs and then manipulation is done by fs. Creating the scope using dbus is important because systemd does more than just create the paths.

@mlaventure
Copy link
Contributor Author

@mrunalp , yes we do. But by default we use the path as retrieved from /proc/self/cgroup when calling the dbu API.

The issue is that this path is in an invalid format. It would need to either be converted to an appropriate format or we should instead use system.slice if nothing was provided by the user.

@cyphar
Copy link
Member

cyphar commented Jan 23, 2016

@mlaventure venture Can you please share the command you used on master? All of these work for me:

$ ./bundles/latest/binary/docker run alpine cat /proc/self/cgroup
11:cpuset:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
10:memory:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
9:perf_event:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
8:freezer:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
7:blkio:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
6:devices:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
5:net_cls,net_prio:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
4:hugetlb:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
3:pids:/
2:cpu,cpuacct:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
1:name=systemd:/system.slice/docker-f34a95b1920a7160c1656f4651ebde7a2e3b14147f67c3249f55e37117bf6204.scope
$ ./bundles/latest/binary/docker run --cgroup-parent=test.slice alpine cat /proc/self/cgroup
11:cpuset:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
10:memory:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
9:perf_event:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
8:freezer:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
7:blkio:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
6:devices:/test.slice
5:net_cls,net_prio:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
4:hugetlb:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
3:pids:/
2:cpu,cpuacct:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
1:name=systemd:/test.slice/docker-c3d52f73a3b86908c3dc57674ca3143a2274ba3d4e1e2647afaa115e2aacaf60.scope
$ ./bundles/latest/binary/docker run --cgroup-parent=test-a-b-c.slice alpine cat /proc/self/cgroup
11:cpuset:/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
10:memory:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
9:perf_event:/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
8:freezer:/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
7:blkio:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
6:devices:/test.slice
5:net_cls,net_prio:/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
4:hugetlb:/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
3:pids:/
2:cpu,cpuacct:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope
1:name=systemd:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-e5fc84ab93f9cd3c2a5308d7072c91474b3e01abceb8524746bd6223964af059.scope

Which all work (although the last one causes warnings about missing kernel features because we aren't expanding paths the way that systemd does with - becoming sub-slices -- which is definitely a bug). What wouldn't work is something with /s inside (like --cgroup-parent=a.slice/b.slice), and it shouldn't IMO.

Really, we should rethink how we're dealing with systemd's transient units IMHO. I can take a look at this later.

@cyphar
Copy link
Member

cyphar commented Jan 23, 2016

Actually, the output of --cgroup-parent=test-a-b-c.slice is blatantly incorrect for cgroups that systemd doesn't support (which I should work on fixing).

@mlaventure
Copy link
Contributor Author

@cyphar docker works because it calls libcontainer.New() directly. I am referring to the runc binary itself. It does not provide a way to use anything but cgroup fs. When adding the check for systemd you end up with the error I mentioned above.

@cyphar
Copy link
Member

cyphar commented Jan 25, 2016

#511 actually solves the above issue I describe about incorrect cgroup paths.

@mlaventure Can you share the code you used to add the systemd check?

@mlaventure
Copy link
Contributor Author

@cyphar I've used this:

diff --git a/libcontainer/factory_linux.go b/libcontainer/factory_linux.go
index ae5db9c..0700355 100644
--- a/libcontainer/factory_linux.go
+++ b/libcontainer/factory_linux.go
@@ -114,7 +114,9 @@ func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {
                CriuPath:  "criu",
        }
        InitArgs(os.Args[0], "init")(l)
-       Cgroupfs(l)
+       if l.NewCgroupsManager != nil {
+               Cgroupfs(l)
+       }
        for _, opt := range options {
                if err := opt(l); err != nil {
                        return nil, err
diff --git a/utils.go b/utils.go
index baffef1..8015d39 100644
--- a/utils.go
+++ b/utils.go
@@ -9,6 +9,7 @@ import (

        "github.com/codegangsta/cli"
        "github.com/opencontainers/runc/libcontainer"
+       "github.com/opencontainers/runc/libcontainer/cgroups/systemd"
        "github.com/opencontainers/runc/libcontainer/configs"
        "github.com/opencontainers/specs"
 )
@@ -104,7 +105,14 @@ func loadFactory(context *cli.Context) (libcontainer.Factory, error) {
        if err != nil {
                return nil, err
        }
-       return libcontainer.New(abs, libcontainer.Cgroupfs, func(l *libcontainer.LinuxFactory) error {
+       var newCgroupFunc func(l *libcontainer.LinuxFactory) error
+       if systemd.UseSystemd() {
+               newCgroupFunc = libcontainer.SystemdCgroups
+       } else {
+               newCgroupFunc = libcontainer.Cgroupfs
+       }
+
+       return libcontainer.New(abs, newCgroupFunc, func(l *libcontainer.LinuxFactory) error {
                l.CriuPath = context.GlobalString("criu")
                return nil
        })

@hqhq
Copy link
Contributor

hqhq commented Jan 26, 2016

@mlaventure Yeah I've done this before #325
Haven't dig into this but seems we don't have plan to support systemd cgroup in runc yet.

Parent string `json:"parent"`
// CgroupsPath specifies the path to cgroups that are created and/or joined by the container.
// The path is assumed to be relative to the host system cgroup mountpoint.
CgroupsPath string `json:"cgroupsPath"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just call this Path

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
@mlaventure
Copy link
Contributor Author

  • Rebased
  • Reverted the changes to systemd, since it is not exposed via runc anyway.
  • Remove "cgroups" from the field names

c := &configs.Cgroup{
Name: name,
Parent: myCgroupPath,
Path: filepath.Join(myCgroupPath, name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to do a join here with name right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends, if I have a bundle named busybox and set my cgroup path to be foo/bar

Do we want to use <cgroup-mount-point>/foo/bar/busybox or <cgroup-mount-point>/foo/bar when setting the rules?

Atm the code produce the former; if we want the later, I'm not sure what is the use for name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Fixes opencontainers#396

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
@mlaventure
Copy link
Contributor Author

  • Added deprecation comment to Name and Path along with json omitempty tag
  • Removed name and parent fields from cgroupfs cgroupData type

@crosbymichael
Copy link
Member

LGTM

1 similar comment
@LK4D4
Copy link
Contributor

LK4D4 commented Feb 10, 2016

LGTM

LK4D4 added a commit that referenced this pull request Feb 10, 2016
Replace Cgroup Parent and Name fields by CgroupsPath
@LK4D4 LK4D4 merged commit 4678b01 into opencontainers:master Feb 10, 2016
@cyphar
Copy link
Member

cyphar commented Feb 11, 2016

It turns out that 256f3a8ebc114b01c1630571c33d813a0612ea5c (Add support for CgroupsPath field) breaks cgroups. I'm going to see what I can do to fix this ...

EDIT: It also makes cgroup paths unsafe (even though I'm sure the earlier version of this patch didn't cause this issue). Thank god I added a regression test for invalid cgroup paths in Docker, this could've ended badly. o.O

I'm not sure this should've been merged. #552 fixes the issues I could find.

mlaventure added a commit to mlaventure/runc that referenced this pull request Feb 17, 2016
When CgroupsPath code was introduced with opencontainers#497 it was mistakenly made
to act as the equivalent of docker CgroupsParent. This ensure that it
is taken as the final cgroup path.

A couple of unit tests have been added to prevent future regression.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
@mlaventure mlaventure deleted the cgroups-path branch February 19, 2016 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants