Support running runc as non/less privileged user #38

discordianfish · 2015-06-26T16:19:11Z

Right now runc requires to be run as root where technically it should be possible to run containers as unprivileged user (at least if user namespaces are used)

cgwalters · 2015-06-26T20:10:50Z

See also https://git.gnome.org/browse/linux-user-chroot/tree/README
which supports use by unprivileged users even without user namespaces.

See discussion about use of PR_SET_NO_NEW_PRIVS (which is enforced by seccomp) and additional note about local DoS attacks.

LK4D4 · 2015-06-26T21:25:32Z

We just need to reconsider some default mounts for this to work I think, maybe drop /proc readings in init.

chrisgorgo · 2015-07-10T16:24:02Z

This allow containers to be used in shared computing environments such as HPCs. Very exciting!

zeneofa · 2015-10-01T17:59:36Z

I would be very interested to find out how/when this is implemented, especially as it may help me create a transferable environment to use in HPC environments, as there I have no sudo and no chance to install docker.

discordianfish · 2015-10-02T09:17:32Z

So science is interested. Now we need enterprise so somebody will actually start working on this ;)

davidlt · 2015-10-07T19:27:21Z

+1 from HEP (High Energy Physics) community. You can have your hundreds of thousands of cores even with a common operating system like RHEL/CentOS/Scientific Linux, but you still end with Android-like fragmentation because all computing centres do updates on their schedules. When you send your job to various computing centres you also want to provide your container as environment. Preferably that runs as unprivileged container. The container protects you from the fragmentation and you don't get magic differences due for example due to update of libm.

HTCondor bash system already have some support for Docker: https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

wking · 2015-10-07T19:47:32Z

On Wed, Oct 07, 2015 at 12:27:24PM -0700, davidlt wrote:

HTCondor bash system already have some support for Docker:
https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

Page 29 of those slides shows the host's sysadmin starting a Docker
service.

More generally, I'm not sure how this is going to work for
unprivileged users. namespaces(7) 1 has:

Creation of new namespaces using clone(2) and unshare(2) in most
cases requires the CAP_SYS_ADMIN capability. User namespaces are
the exception: since Linux 3.8, no privilege is required to create a
user namespace.

So an unprivileged user should be able to create a user namespace,
and have some flexiblity inside it. However, you don't have complete
flexibility (to avoid things like 2). I'm not sure if you'd have
enough flexibility to run a useful bundle, but I guess we'll see ;).

Doing something like making runc setuid-root would be a bad idea,
because the caller could use pre-start hooks (for example) to perform
any action they wished with the elevated permissions.

davidlt · 2015-10-09T07:23:30Z

We disallow software to be setuid-root or installed as root. I love the way runc is now, seems to be a single capable binary, no need for special accounts, no need for some daemon. The only thing that's missing is ability to use it without root account.

wking · 2015-10-09T16:10:16Z

On Fri, Oct 09, 2015 at 12:23:32AM -0700, davidlt wrote:

We disallow software to be setuid-root or installed as root…

Most (and hopefully all ;) setuid-root programs are that way because
they need those elevated permissions to accomplish their task. The
question is whether runC can launch all OCI-compliant bundles, or a
useful subset of those, or nothing useful at all without needing those
elevated permissions.

davidlt · 2015-10-11T10:37:02Z

Then maybe the question is: what do we loose if we take away root permissions from runC on RHEL6, RHEL7 and mainline kernels?

IIRC LXC supports unprivileged containers on 3.12 and above kernels. Docker should have support for user namespaces in 1.9 according to PR I managed to find.

We have ~170 computing centres connected and that's how you achieve high number of cores to process big data. Currently they are running RHEL 6.X/CentOS 6.X/Scientific Linux 6.X. They will be moved to 7.X soonish, I believe. There are a few cases people migrated to 7.X and just use full-system-container with LXC and RHEL 6.X rootfs.

Now image (rootfs) and runC binaries distribution to all computing centres is an easy task. At this point I didn't need to involve administrators from all computing sites (no need for special users, no daemons, etc.). But now, I cannot use it because you don't have root permissions.

Preference would have everything centralised where you don't have to involve ~170 people to do the right job, which then would take weeks to months to setup.

dqminh · 2015-10-14T16:53:24Z

To be able to run this as non-privileged user, user namespace is just one of the problems. I think we also need to look at some improvements on the way we are handling cgroups right now as that requires root permission.

AFAIK, unprivileged lxc used a privileged cgmanager daemon to handle its own cgroup assignment.

davidlt · 2015-10-15T11:42:30Z

That's correct.
https://linuxcontainers.org/cgmanager/introduction/

What's CGManager? 
CGManager is a central privileged daemon that manages all your cgroups for you 
through a simple D-Bus API. It's designed to work with nested LXC containers 
as well as accepting unprivileged requests including resolving user namespaces UIDs/GIDs.

mr-c · 2016-02-15T17:16:53Z

Hello,

For scientific computing (where one is running relatively "normal" POSIXy applications) the Common Workflow Language is trying out a solution for rootless containers: https://github.com/common-workflow-language/common-workflow-language/wiki/Userspace-Container-Review#getting-userspace-containers-working-on-ancient-rhel

A bit of a hack, but no root, weird kernel, or suituid binary is needed. Obviously one should use a more mature approach, but for the many academic clusters running older kernels this should suffice until they can upgrade.

[idea by @mr-c, proof of concept by @kdmurray91]

The CWL anxiously awaits a mature and well adopted open containers standard so please steal this idea and run with it :-)

chrisgorgo · 2016-02-15T18:23:09Z

Couple more interesting projects trying to solve this problem:

http://gmkurtzer.github.io/singularity/ - capturing binary dependencies in portable manner
https://bitbucket.org/berkeleylab/shifter - running Docker images HPCs in a secure manner

mr-c · 2016-02-15T19:31:50Z

Note that shifter uses (real) chroot and thus requires root.

mr-c · 2016-02-15T19:35:10Z

Though Shifter could be adapted to use proot/fakechroot. I quite like their
Python code for taking a docker hub container and producing a tarball or
unpacked tree.

chrisgorgo · 2016-02-15T19:41:16Z

I have not used shifter, but their documentation (see
https://www.nersc.gov/research-and-development/user-defined-images/)
suggests that any user can run shifterimg to convert docker image to safe
shifter image and subsequently run it without elevated privileges.

On Mon, Feb 15, 2016 at 11:35 AM, Michael R. Crusoe <
notifications@github.com> wrote:

Though Shifter could be adapted to use proot/fakechroot. I quite like their
Python code for taking a docker hub container and producing a tarball or
unpacked tree.

On Mon, Feb 15, 2016 at 8:31 PM Michael Crusoe michael.crusoe@gmail.com
wrote:

Note that shifter uses (real) chroot and thus requires root.

On Mon, Feb 15, 2016 at 7:23 PM Chris Filo Gorgolewski <
notifications@github.com> wrote:

Couple more interesting projects trying to solve this problem:

http://gmkurtzer.github.io/singularity/ - capturing binary
dependencies in portable manner

https://bitbucket.org/berkeleylab/shifter - running Docker images
HPCs in a secure manner

—
Reply to this email directly or view it on GitHub
<
https://github.com/opencontainers/runc/issues/38#issuecomment-184335411>
.

Michael R. Crusoe CWL Community Engineer crusoe@ucdavis.edu
mcrusoe@msu.edu
Common Workflow Language project University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

Michael R. Crusoe CWL Community Engineer crusoe@ucdavis.edu
mcrusoe@msu.edu
Common Workflow Language project University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

—
Reply to this email directly or view it on GitHub
#38 (comment).

mr-c · 2016-02-15T19:55:39Z

@chrisfilo Yeah, we thought the same thing, then dug further
udiRoot/src/shifter.c: fprintf(stderr, "%s\n", "Not running with root privileges, will fail.");

chrisgorgo · 2016-02-15T20:00:16Z

If it requires root what's the point of shifter then?

On Mon, Feb 15, 2016 at 11:55 AM, Michael R. Crusoe <
notifications@github.com> wrote:

@chrisfilo https://github.com/chrisfilo Yeah, we thought the same
thing, then dug further
udiRoot/src/shifter.c: fprintf(stderr, "%s\n", "Not running with root
privileges, will fail.");
https://bitbucket.org/berkeleylab/shifter/src/dae758dd5f57b55c1574fb6f295f38a6c481139e/udiRoot/src/shifter.c?at=master&fileviewer=file-view-default#shifter.c-184

—
Reply to this email directly or view it on GitHub
#38 (comment).

mr-c · 2016-02-15T20:20:45Z

from what I see (without running it): scheduler integration (slurm, others), ability to run same image simultaneously across a cluster, caching and management of images

mr-c · 2016-02-15T20:22:20Z

To return this to @discordianfish original question: proot allows root-free running of "normal" containers (but possibly not some exotic containers). However I wouldn't rely on it for security, but would use it for ease-of-use scenarios.

kdm9 · 2016-02-15T20:26:01Z

I'll make clear something that @mr-c has implied: We need an unprivileged user to be able to do all operations including installation, setup and image management solely within $HOME (or some other unrestricted path), without being root. In other words, this should all be possible without any admin intervention whatsoever.

jessfraz · 2016-04-17T23:12:40Z

I have started a thread on the mailing list here https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/yutVaSLcqWI with my proposed actions to make this a reality

cgwalters · 2016-04-18T00:37:25Z

Above I linked linux-user-chroot, this code has now migrated to https://github.com/projectatomic/bubblewrap

mr-c · 2016-04-18T09:23:41Z

FYI, bubblewrap is setuid & requires non-privileged user namespaces; which are great when you have them. RHEL6 does not.

cgwalters · 2016-04-18T13:42:17Z

bubblewrap does not require user namespaces - allowing container features to be safely exposed to userspace on kernels which don't have CONFIG_USERNS is a large part of the point.

It might be interesting to have runc support mapping JSON configuration to bubblewrap, but in the end over time user namespaces will hopefully be secure enough it'll be a legacy thing. In the meantime though, if anyone is targeting non-userns kernels, bubblewrap might be interesting.

mr-c · 2016-04-18T13:45:00Z

Hello @cgwalters ,

Here is my experience trying the bubblewrap demo:

mcrusoe@mrcdev:~/src/bubblewrap$ PATH=$PWD:$PATH ./demos/bubblewrap-shell.sh 
No permissions to creating new namespace, likely because the kernel does not allow non-privileged user namespaces. On e.g. debian this can be enabled with 'sysctl kernel.unprivileged_userns_clone=1'.

rhatdan · 2016-04-18T13:52:37Z

@mr-c Did you make bubblewrap setuid?

alexlarsson · 2016-04-18T14:08:58Z

@mr-c You need to have either user namespaces, or have the bwrap setuid/setcaps. There is no other way with the current kernel to use namespces.

alexlarsson · 2016-04-18T14:09:40Z

@mr-c What distro/kernel are you running on?

mr-c · 2016-04-18T14:11:11Z

@alexlarsson I understand, that is why I was advocating for proot style fallback support in #38 (comment). This particular cluster is running RHEL 6.6.

alexlarsson · 2016-04-18T14:33:26Z

@mr-c I do want to note that I believe bubblewrap shipped as setuid is safe. Its a very minimal C app with zero dependencies (only libc) that is written with security/setuid in mind.

alexlarsson · 2016-04-18T14:33:49Z

Its not like shipping with a setuid runc which lets you own the system.

mr-c · 2016-04-18T14:36:37Z

Hey @alexlarsson, I'm not at all saying it isn't safe, just that I'm looking for other approaches as setuid binaries aren't acceptable on basically all of the academic/research computing clusters I have run into.

alexlarsson · 2016-04-18T15:22:57Z

@mr-c Even if say bubblewrap was in rhel 6.x?

mr-c · 2016-04-18T15:42:03Z

@alexlarsson Of course, if runc/opencontainers support ships with the OS that they installed then there is no fight :-)

mr-c · 2016-04-20T13:33:45Z

Oh, I just learned that there is a thread on the mailing list interesecting this conversation: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/yutVaSLcqWI

mr-c · 2016-04-20T13:34:17Z

Correct URL is https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/yutVaSLcqWI

rhatdan · 2016-04-20T13:38:02Z

If you are running on RHEL6, how do you get User Namespace support?

mr-c · 2016-04-20T13:44:38Z

Hello @rhatdan , Is that question directed at me?

I'm not personally running RHEL6 on any of my systems, but a sub-thread was about finding a way to run containerized software on academic computing clusters, where RHEL 6 is very common. A proposed solution is in #38 (comment) which does not rely on capabilities, setuid binaries, or user namespace support.

Since that post there have been other proposals to use some combination of capabilities, setuid binaries, or user namespace support to enable running runc as non/less privileged use. These won't be usable on academic computing clusters for a year or two.

I think it would be great to see both proposals developed and incorporated.

rhatdan · 2016-04-20T13:52:44Z

Ok I have not reviewed the list of proposals. But my bottom line would be to get to rhel7 version if at all possible to work with the latest container technologies.

cyphar · 2016-04-20T13:56:15Z

@mr-c IMO, it wouldn't make sense to incorporate proot into runc. If you can already have proot on clusters, I'm confused why you also want that to be a part of runc. You would get very few of the features of runc on a kernels as old as the ones in RHEL 6, and it certainly wouldn't be OCI compliant. Since runc is a container runtime, I don't see why adding support for another runtime that isn't container-based makes sense.

Most notably, AFAICS proot doesn't have the same security properties as Linux containers (which are fairly secure, with some caveats). As you already need a rootfs for runc, why not just use proot directly?

Am I missing something?

mr-c · 2016-04-20T14:02:09Z

@cyphar In the scientific software domain we are primarily using containers to solve software portability concerns, not security.

We anticipate, and support, runc becoming the standard interface for container management.

It would be great if there was a built in fallback to support running an otherwise trusted program inside of a runc container on older systems such as RHEL6 where there is effectively zero container support on academic/research computing clusters.

wking · 2016-04-20T18:56:00Z

On Wed, Apr 20, 2016 at 07:02:13AM -0700, Michael R. Crusoe wrote:
“It would be great if there was a built in fallback to support
running an otherwise trusted program inside of a runc container on
older systems such as RHEL6 where there is effectively zero
container support on academic/research computing clusters.
”

I think this may be conflating images and running containers. With
shared tooling like 1, publishers can push images and users can
unpack them into local bundles 2. Some users will launch those
bundles using Linux namespaces / cgroups via runC. But others would
launch those same bundles using a proot wrapper that ignored
namespacing and just setup the mounts (or whatever).

Obviously, not all runtime-spec configs would work with a
proot-wrapper approach (e.g. if the image required a network namespace
or some such), but not all runtime-spec configs will work for
unprivileged users regardless of the runtime they're using. And folks
pushing images with maximum portablilty in mind can try and stick to
settings like root.path and process.args that are likely supported by
all runtimes (even if they aren't fully compliant).

davidlt · 2016-04-20T19:50:54Z

Long, but this is picture from my point of view.

I am successfully using PRoot for some activities on RHEL/CentOS 6. I am even using it with QEMU for emulating POWER8 with Fedora rootfs and ARMv8 with CentOS rootfs. It does work.

It is true that RHEL 6 is currently the dominating Linux distribution and hopefully first roadmaps will be announced for migration to RHEL 7 this year (I hope). In my case we are building <400 RPMs (relocatable) which ends up <10GB for a full release. I built everything from glibc, gcc, binutils, llvm, gdb, python, etc. and it has to run on a high number of computing centres. The only common thing is that they have RHEL6/CentOS6/Scientific Linux 6 (binary compatible) installed as OS (required). Installation of our software is centrally controlled via distributed file system which is mounted in each site (this solved some of problems). So, we can make software centrally available at computer centres, but none of that ever depends on root permissions (requirement).

Yes, at some point agreement could be made that some solution is required for Linux containers and it has to be provided by all computing centres. This is not a quick procedure.

I don't think we need (yet) a strong security guarantee. What we need is ability to control software stack expect kernel. E.g., we don't want to have different physics results because half of computer centres decided to do yum upgrade/update and their glibc (libm) was updated. Thus it is a way to increase reproducibility. We started shipping our glibc once we hit a number of issues with TLS that was blocking our production jobs, but the fixes were back ported only in CentOS 7.2. Thus we had to patch our glibc for a long period. This also unbinds us from migration schedule for operating system in computer centres. We would decide on which rootfs we run.

I would love to have ability to run a job within a container, but add hard limits on resources (CPUs and memory). If the job was scheduled on 8-core slot with 16GB of RAM, it should go not outside these boundaries. Currently this is partly done via job scheduler monitoring and virtual memory limit (wrong). There are no strict boundaries as far as I know. These things can be differently done depending on computer centre, no one way of doing it, I guess.

In addition to that statistics (networking, CPU, memory, IO, etc) per job would be interesting. Even if job is running multiple processes and does not have native statistics API or similar. This also means there is one command way for acquiring statistics on jobs.

Of course, I would prefer to have an industry standard which works in these environments (or at least there are plans), but not to have yet-another-solution-for-Linux-container-like-environment.

discordianfish · 2016-04-21T17:51:13Z

I've heard @jfrazelle wanted to look into this? :)

cyphar · 2016-04-22T15:52:44Z

@davidlt It isn't currently possible to set cgroup limits in an unprivileged user namespace (that is, if you start as a regular user). So you can't really set the hard limits in that way, which limits you to rlimits that aren't nearly as useful. The same holds for proot-style chroots. Hopefully we will be able to set cgroup limits in an unprivileged user namespace from the kernel side soon (maybe cgroup namespaces will help in that regard, or cgroupv2). You can still get statistics though.

For me, the important question is whether we can use proot and implement enough of the OCI spec implementation to make it compliant (even ignoring things like cgroups which can reasonably say we don't support). And of course, the mode of running it as root would not be supported for security reasons (since it appears to work through a bunch of seccomp and ptrace black magic).

cyphar · 2016-04-23T11:09:22Z

Okay, this works on my fork of runC. There are some outstanding things to do, mostly related to giving more meaningful errors to users when their config won't work with a rootless container setup. You can see the code here: https://github.com/cyphar/runc/tree/rootless-containers

crosbymichael · 2016-06-02T00:48:04Z

Closing this one so we can use #774 as the main tracking issue for this feature. It has a checklist and everything.

Closes opencontainers#38 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>

…-4.6 libctr/init_linux: reorder chdir

crosbymichael modified the milestone: 0.1.0 Feb 18, 2016

cyphar mentioned this issue Apr 23, 2016

Rootless Containers #774

Merged

46 tasks

justincormack mentioned this issue Apr 28, 2016

Docker-in-docker should work, even without privileged mode moby/moby#22139

Closed

cyphar modified the milestones: 0.2.0, 0.1.0 May 9, 2016

crosbymichael closed this as completed Jun 2, 2016

wking mentioned this issue Sep 15, 2016

Apply ownership in unpacked entries opencontainers/image-tools#3

Closed

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017

config: clarify the uname mapping

95f0c67

Closes opencontainers#38 Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>

cyphar added the rootless-containers label Mar 17, 2018

haircommander pushed a commit to haircommander/runc that referenced this issue Apr 15, 2021

Merge pull request opencontainers#38 from haircommander/reorder-chdir…

086e841

…-4.6 libctr/init_linux: reorder chdir

Support running runc as non/less privileged user #38

Support running runc as non/less privileged user #38

Comments

discordianfish commented Jun 26, 2015

cgwalters commented Jun 26, 2015

LK4D4 commented Jun 26, 2015

chrisgorgo commented Jul 10, 2015

zeneofa commented Oct 1, 2015

discordianfish commented Oct 2, 2015

davidlt commented Oct 7, 2015

wking commented Oct 7, 2015

davidlt commented Oct 9, 2015

wking commented Oct 9, 2015

davidlt commented Oct 11, 2015

dqminh commented Oct 14, 2015

davidlt commented Oct 15, 2015

mr-c commented Feb 15, 2016

chrisgorgo commented Feb 15, 2016

mr-c commented Feb 15, 2016 • edited Loading

mr-c commented Feb 15, 2016 • edited Loading

chrisgorgo commented Feb 15, 2016

mr-c commented Feb 15, 2016

chrisgorgo commented Feb 15, 2016

mr-c commented Feb 15, 2016

mr-c commented Feb 15, 2016

kdm9 commented Feb 15, 2016

jessfraz commented Apr 17, 2016

cgwalters commented Apr 18, 2016

mr-c commented Apr 18, 2016

cgwalters commented Apr 18, 2016

mr-c commented Apr 18, 2016

rhatdan commented Apr 18, 2016

alexlarsson commented Apr 18, 2016

alexlarsson commented Apr 18, 2016

mr-c commented Apr 18, 2016

alexlarsson commented Apr 18, 2016

alexlarsson commented Apr 18, 2016

mr-c commented Apr 18, 2016

alexlarsson commented Apr 18, 2016

mr-c commented Apr 18, 2016

mr-c commented Apr 20, 2016 • edited Loading

mr-c commented Apr 20, 2016

rhatdan commented Apr 20, 2016

mr-c commented Apr 20, 2016

rhatdan commented Apr 20, 2016

cyphar commented Apr 20, 2016 • edited Loading

mr-c commented Apr 20, 2016

wking commented Apr 20, 2016 • edited Loading

davidlt commented Apr 20, 2016

discordianfish commented Apr 21, 2016

cyphar commented Apr 22, 2016 • edited Loading

cyphar commented Apr 23, 2016 • edited Loading

crosbymichael commented Jun 2, 2016

mr-c commented Feb 15, 2016 •

edited

Loading

mr-c commented Feb 15, 2016 •

edited

Loading

mr-c commented Apr 20, 2016 •

edited

Loading

cyphar commented Apr 20, 2016 •

edited

Loading

wking commented Apr 20, 2016 •

edited

Loading

cyphar commented Apr 22, 2016 •

edited

Loading

cyphar commented Apr 23, 2016 •

edited

Loading