Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running runc as non/less privileged user #38

Closed
discordianfish opened this issue Jun 26, 2015 · 54 comments
Closed

Support running runc as non/less privileged user #38

discordianfish opened this issue Jun 26, 2015 · 54 comments
Milestone

Comments

@discordianfish
Copy link
Contributor

Right now runc requires to be run as root where technically it should be possible to run containers as unprivileged user (at least if user namespaces are used)

@cgwalters
Copy link
Contributor

See also https://git.gnome.org/browse/linux-user-chroot/tree/README
which supports use by unprivileged users even without user namespaces.

See discussion about use of PR_SET_NO_NEW_PRIVS (which is enforced by seccomp) and additional note about local DoS attacks.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 26, 2015

We just need to reconsider some default mounts for this to work I think, maybe drop /proc readings in init.

@chrisgorgo
Copy link

This allow containers to be used in shared computing environments such as HPCs. Very exciting!

@zeneofa
Copy link

zeneofa commented Oct 1, 2015

I would be very interested to find out how/when this is implemented, especially as it may help me create a transferable environment to use in HPC environments, as there I have no sudo and no chance to install docker.

@discordianfish
Copy link
Contributor Author

So science is interested. Now we need enterprise so somebody will actually start working on this ;)

@davidlt
Copy link

davidlt commented Oct 7, 2015

+1 from HEP (High Energy Physics) community. You can have your hundreds of thousands of cores even with a common operating system like RHEL/CentOS/Scientific Linux, but you still end with Android-like fragmentation because all computing centres do updates on their schedules. When you send your job to various computing centres you also want to provide your container as environment. Preferably that runs as unprivileged container. The container protects you from the fragmentation and you don't get magic differences due for example due to update of libm.

HTCondor bash system already have some support for Docker: https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

@wking
Copy link
Contributor

wking commented Oct 7, 2015

On Wed, Oct 07, 2015 at 12:27:24PM -0700, davidlt wrote:

HTCondor bash system already have some support for Docker:
https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

Page 29 of those slides shows the host's sysadmin starting a Docker
service.

More generally, I'm not sure how this is going to work for
unprivileged users. namespaces(7) 1 has:

Creation of new namespaces using clone(2) and unshare(2) in most
cases requires the CAP_SYS_ADMIN capability. User namespaces are
the exception: since Linux 3.8, no privilege is required to create a
user namespace.

So an unprivileged user should be able to create a user namespace,
and have some flexiblity inside it. However, you don't have complete
flexibility (to avoid things like 2). I'm not sure if you'd have
enough flexibility to run a useful bundle, but I guess we'll see ;).

Doing something like making runc setuid-root would be a bad idea,
because the caller could use pre-start hooks (for example) to perform
any action they wished with the elevated permissions.

@davidlt
Copy link

davidlt commented Oct 9, 2015

We disallow software to be setuid-root or installed as root. I love the way runc is now, seems to be a single capable binary, no need for special accounts, no need for some daemon. The only thing that's missing is ability to use it without root account.

@wking
Copy link
Contributor

wking commented Oct 9, 2015

On Fri, Oct 09, 2015 at 12:23:32AM -0700, davidlt wrote:

We disallow software to be setuid-root or installed as root…

Most (and hopefully all ;) setuid-root programs are that way because
they need those elevated permissions to accomplish their task. The
question is whether runC can launch all OCI-compliant bundles, or a
useful subset of those, or nothing useful at all without needing those
elevated permissions.

@davidlt
Copy link

davidlt commented Oct 11, 2015

Then maybe the question is: what do we loose if we take away root permissions from runC on RHEL6, RHEL7 and mainline kernels?

IIRC LXC supports unprivileged containers on 3.12 and above kernels. Docker should have support for user namespaces in 1.9 according to PR I managed to find.

We have ~170 computing centres connected and that's how you achieve high number of cores to process big data. Currently they are running RHEL 6.X/CentOS 6.X/Scientific Linux 6.X. They will be moved to 7.X soonish, I believe. There are a few cases people migrated to 7.X and just use full-system-container with LXC and RHEL 6.X rootfs.

Now image (rootfs) and runC binaries distribution to all computing centres is an easy task. At this point I didn't need to involve administrators from all computing sites (no need for special users, no daemons, etc.). But now, I cannot use it because you don't have root permissions.

Preference would have everything centralised where you don't have to involve ~170 people to do the right job, which then would take weeks to months to setup.

@dqminh
Copy link
Contributor

dqminh commented Oct 14, 2015

To be able to run this as non-privileged user, user namespace is just one of the problems. I think we also need to look at some improvements on the way we are handling cgroups right now as that requires root permission.

AFAIK, unprivileged lxc used a privileged cgmanager daemon to handle its own cgroup assignment.

@davidlt
Copy link

davidlt commented Oct 15, 2015

That's correct.
https://linuxcontainers.org/cgmanager/introduction/

What's CGManager? 
CGManager is a central privileged daemon that manages all your cgroups for you 
through a simple D-Bus API. It's designed to work with nested LXC containers 
as well as accepting unprivileged requests including resolving user namespaces UIDs/GIDs.

@mr-c
Copy link

mr-c commented Feb 15, 2016

Hello,

For scientific computing (where one is running relatively "normal" POSIXy applications) the Common Workflow Language is trying out a solution for rootless containers: https://github.com/common-workflow-language/common-workflow-language/wiki/Userspace-Container-Review#getting-userspace-containers-working-on-ancient-rhel

A bit of a hack, but no root, weird kernel, or suituid binary is needed. Obviously one should use a more mature approach, but for the many academic clusters running older kernels this should suffice until they can upgrade.

[idea by @mr-c, proof of concept by @kdmurray91]

The CWL anxiously awaits a mature and well adopted open containers standard so please steal this idea and run with it :-)

@chrisgorgo
Copy link

Couple more interesting projects trying to solve this problem:

@mr-c
Copy link

mr-c commented Feb 15, 2016

Note that shifter uses (real) chroot and thus requires root.

@mr-c
Copy link

mr-c commented Feb 15, 2016

Though Shifter could be adapted to use proot/fakechroot. I quite like their
Python code for taking a docker hub container and producing a tarball or
unpacked tree.

@chrisgorgo
Copy link

I have not used shifter, but their documentation (see
https://www.nersc.gov/research-and-development/user-defined-images/)
suggests that any user can run shifterimg to convert docker image to safe
shifter image and subsequently run it without elevated privileges.

On Mon, Feb 15, 2016 at 11:35 AM, Michael R. Crusoe <
notifications@github.com> wrote:

Though Shifter could be adapted to use proot/fakechroot. I quite like their
Python code for taking a docker hub container and producing a tarball or
unpacked tree.

On Mon, Feb 15, 2016 at 8:31 PM Michael Crusoe michael.crusoe@gmail.com
wrote:

Note that shifter uses (real) chroot and thus requires root.

On Mon, Feb 15, 2016 at 7:23 PM Chris Filo Gorgolewski <
notifications@github.com> wrote:

Couple more interesting projects trying to solve this problem:


Reply to this email directly or view it on GitHub
<
https://github.com/opencontainers/runc/issues/38#issuecomment-184335411>
.

Michael R. Crusoe CWL Community Engineer crusoe@ucdavis.edu
mcrusoe@msu.edu
Common Workflow Language project University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

Michael R. Crusoe CWL Community Engineer crusoe@ucdavis.edu
mcrusoe@msu.edu
Common Workflow Language project University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe


Reply to this email directly or view it on GitHub
#38 (comment).

@mr-c
Copy link

mr-c commented Feb 15, 2016

@chrisfilo Yeah, we thought the same thing, then dug further
udiRoot/src/shifter.c: fprintf(stderr, "%s\n", "Not running with root privileges, will fail.");

@chrisgorgo
Copy link

If it requires root what's the point of shifter then?

On Mon, Feb 15, 2016 at 11:55 AM, Michael R. Crusoe <
notifications@github.com> wrote:

@chrisfilo https://github.com/chrisfilo Yeah, we thought the same
thing, then dug further
udiRoot/src/shifter.c: fprintf(stderr, "%s\n", "Not running with root
privileges, will fail.");
https://bitbucket.org/berkeleylab/shifter/src/dae758dd5f57b55c1574fb6f295f38a6c481139e/udiRoot/src/shifter.c?at=master&fileviewer=file-view-default#shifter.c-184


Reply to this email directly or view it on GitHub
#38 (comment).

@mr-c
Copy link

mr-c commented Feb 15, 2016

from what I see (without running it): scheduler integration (slurm, others), ability to run same image simultaneously across a cluster, caching and management of images

@mr-c
Copy link

mr-c commented Feb 15, 2016

To return this to @discordianfish original question: proot allows root-free running of "normal" containers (but possibly not some exotic containers). However I wouldn't rely on it for security, but would use it for ease-of-use scenarios.

@kdm9
Copy link

kdm9 commented Feb 15, 2016

I'll make clear something that @mr-c has implied: We need an unprivileged user to be able to do all operations including installation, setup and image management solely within $HOME (or some other unrestricted path), without being root. In other words, this should all be possible without any admin intervention whatsoever.

@crosbymichael crosbymichael modified the milestone: 0.1.0 Feb 18, 2016
@jessfraz
Copy link
Contributor

I have started a thread on the mailing list here https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/yutVaSLcqWI with my proposed actions to make this a reality

@cgwalters
Copy link
Contributor

Above I linked linux-user-chroot, this code has now migrated to https://github.com/projectatomic/bubblewrap

@mr-c
Copy link

mr-c commented Apr 18, 2016

FYI, bubblewrap is setuid & requires non-privileged user namespaces; which are great when you have them. RHEL6 does not.

@cgwalters
Copy link
Contributor

bubblewrap does not require user namespaces - allowing container features to be safely exposed to userspace on kernels which don't have CONFIG_USERNS is a large part of the point.

It might be interesting to have runc support mapping JSON configuration to bubblewrap, but in the end over time user namespaces will hopefully be secure enough it'll be a legacy thing. In the meantime though, if anyone is targeting non-userns kernels, bubblewrap might be interesting.

@mr-c
Copy link

mr-c commented Apr 18, 2016

Hello @cgwalters ,

Here is my experience trying the bubblewrap demo:

mcrusoe@mrcdev:~/src/bubblewrap$ PATH=$PWD:$PATH ./demos/bubblewrap-shell.sh 
No permissions to creating new namespace, likely because the kernel does not allow non-privileged user namespaces. On e.g. debian this can be enabled with 'sysctl kernel.unprivileged_userns_clone=1'.

@rhatdan
Copy link
Contributor

rhatdan commented Apr 18, 2016

@mr-c Did you make bubblewrap setuid?

@alexlarsson
Copy link
Contributor

@mr-c You need to have either user namespaces, or have the bwrap setuid/setcaps. There is no other way with the current kernel to use namespces.

@alexlarsson
Copy link
Contributor

@mr-c What distro/kernel are you running on?

@mr-c
Copy link

mr-c commented Apr 18, 2016

@alexlarsson I understand, that is why I was advocating for proot style fallback support in #38 (comment). This particular cluster is running RHEL 6.6.

@alexlarsson
Copy link
Contributor

@mr-c I do want to note that I believe bubblewrap shipped as setuid is safe. Its a very minimal C app with zero dependencies (only libc) that is written with security/setuid in mind.

@alexlarsson
Copy link
Contributor

Its not like shipping with a setuid runc which lets you own the system.

@mr-c
Copy link

mr-c commented Apr 18, 2016

Hey @alexlarsson, I'm not at all saying it isn't safe, just that I'm looking for other approaches as setuid binaries aren't acceptable on basically all of the academic/research computing clusters I have run into.

@alexlarsson
Copy link
Contributor

@mr-c Even if say bubblewrap was in rhel 6.x?

@mr-c
Copy link

mr-c commented Apr 18, 2016

@alexlarsson Of course, if runc/opencontainers support ships with the OS that they installed then there is no fight :-)

@mr-c
Copy link

mr-c commented Apr 20, 2016

Oh, I just learned that there is a thread on the mailing list interesecting this conversation: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/yutVaSLcqWI

@mr-c
Copy link

mr-c commented Apr 20, 2016

@rhatdan
Copy link
Contributor

rhatdan commented Apr 20, 2016

If you are running on RHEL6, how do you get User Namespace support?

@mr-c
Copy link

mr-c commented Apr 20, 2016

Hello @rhatdan , Is that question directed at me?

I'm not personally running RHEL6 on any of my systems, but a sub-thread was about finding a way to run containerized software on academic computing clusters, where RHEL 6 is very common. A proposed solution is in #38 (comment) which does not rely on capabilities, setuid binaries, or user namespace support.

Since that post there have been other proposals to use some combination of capabilities, setuid binaries, or user namespace support to enable running runc as non/less privileged use. These won't be usable on academic computing clusters for a year or two.

I think it would be great to see both proposals developed and incorporated.

@rhatdan
Copy link
Contributor

rhatdan commented Apr 20, 2016

Ok I have not reviewed the list of proposals. But my bottom line would be to get to rhel7 version if at all possible to work with the latest container technologies.

@cyphar
Copy link
Member

cyphar commented Apr 20, 2016

@mr-c IMO, it wouldn't make sense to incorporate proot into runc. If you can already have proot on clusters, I'm confused why you also want that to be a part of runc. You would get very few of the features of runc on a kernels as old as the ones in RHEL 6, and it certainly wouldn't be OCI compliant. Since runc is a container runtime, I don't see why adding support for another runtime that isn't container-based makes sense.

Most notably, AFAICS proot doesn't have the same security properties as Linux containers (which are fairly secure, with some caveats). As you already need a rootfs for runc, why not just use proot directly?

Am I missing something?

@mr-c
Copy link

mr-c commented Apr 20, 2016

@cyphar In the scientific software domain we are primarily using containers to solve software portability concerns, not security.

We anticipate, and support, runc becoming the standard interface for container management.

It would be great if there was a built in fallback to support running an otherwise trusted program inside of a runc container on older systems such as RHEL6 where there is effectively zero container support on academic/research computing clusters.

@wking
Copy link
Contributor

wking commented Apr 20, 2016

On Wed, Apr 20, 2016 at 07:02:13AM -0700, Michael R. Crusoe wrote:
“It would be great if there was a built in fallback to support
running an otherwise trusted program inside of a runc container on
older systems such as RHEL6 where there is effectively zero
container support on academic/research computing clusters.

I think this may be conflating images and running containers. With
shared tooling like 1, publishers can push images and users can
unpack them into local bundles 2. Some users will launch those
bundles using Linux namespaces / cgroups via runC. But others would
launch those same bundles using a proot wrapper that ignored
namespacing and just setup the mounts (or whatever).

Obviously, not all runtime-spec configs would work with a
proot-wrapper approach (e.g. if the image required a network namespace
or some such), but not all runtime-spec configs will work for
unprivileged users regardless of the runtime they're using. And folks
pushing images with maximum portablilty in mind can try and stick to
settings like root.path and process.args that are likely supported by
all runtimes (even if they aren't fully compliant).

@davidlt
Copy link

davidlt commented Apr 20, 2016

Long, but this is picture from my point of view.

I am successfully using PRoot for some activities on RHEL/CentOS 6. I am even using it with QEMU for emulating POWER8 with Fedora rootfs and ARMv8 with CentOS rootfs. It does work.

It is true that RHEL 6 is currently the dominating Linux distribution and hopefully first roadmaps will be announced for migration to RHEL 7 this year (I hope). In my case we are building <400 RPMs (relocatable) which ends up <10GB for a full release. I built everything from glibc, gcc, binutils, llvm, gdb, python, etc. and it has to run on a high number of computing centres. The only common thing is that they have RHEL6/CentOS6/Scientific Linux 6 (binary compatible) installed as OS (required). Installation of our software is centrally controlled via distributed file system which is mounted in each site (this solved some of problems). So, we can make software centrally available at computer centres, but none of that ever depends on root permissions (requirement).

Yes, at some point agreement could be made that some solution is required for Linux containers and it has to be provided by all computing centres. This is not a quick procedure.

I don't think we need (yet) a strong security guarantee. What we need is ability to control software stack expect kernel. E.g., we don't want to have different physics results because half of computer centres decided to do yum upgrade/update and their glibc (libm) was updated. Thus it is a way to increase reproducibility. We started shipping our glibc once we hit a number of issues with TLS that was blocking our production jobs, but the fixes were back ported only in CentOS 7.2. Thus we had to patch our glibc for a long period. This also unbinds us from migration schedule for operating system in computer centres. We would decide on which rootfs we run.

I would love to have ability to run a job within a container, but add hard limits on resources (CPUs and memory). If the job was scheduled on 8-core slot with 16GB of RAM, it should go not outside these boundaries. Currently this is partly done via job scheduler monitoring and virtual memory limit (wrong). There are no strict boundaries as far as I know. These things can be differently done depending on computer centre, no one way of doing it, I guess.

In addition to that statistics (networking, CPU, memory, IO, etc) per job would be interesting. Even if job is running multiple processes and does not have native statistics API or similar. This also means there is one command way for acquiring statistics on jobs.

Of course, I would prefer to have an industry standard which works in these environments (or at least there are plans), but not to have yet-another-solution-for-Linux-container-like-environment.

@discordianfish
Copy link
Contributor Author

I've heard @jfrazelle wanted to look into this? :)

@cyphar
Copy link
Member

cyphar commented Apr 22, 2016

@davidlt It isn't currently possible to set cgroup limits in an unprivileged user namespace (that is, if you start as a regular user). So you can't really set the hard limits in that way, which limits you to rlimits that aren't nearly as useful. The same holds for proot-style chroots. Hopefully we will be able to set cgroup limits in an unprivileged user namespace from the kernel side soon (maybe cgroup namespaces will help in that regard, or cgroupv2). You can still get statistics though.

For me, the important question is whether we can use proot and implement enough of the OCI spec implementation to make it compliant (even ignoring things like cgroups which can reasonably say we don't support). And of course, the mode of running it as root would not be supported for security reasons (since it appears to work through a bunch of seccomp and ptrace black magic).

@cyphar
Copy link
Member

cyphar commented Apr 23, 2016

Okay, this works on my fork of runC. There are some outstanding things to do, mostly related to giving more meaningful errors to users when their config won't work with a rootless container setup. You can see the code here: https://github.com/cyphar/runc/tree/rootless-containers

ohmygoditactuallyworked

@crosbymichael
Copy link
Member

Closing this one so we can use #774 as the main tracking issue for this feature. It has a checklist and everything.

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
Closes opencontainers#38

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
haircommander pushed a commit to haircommander/runc that referenced this issue Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests