Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker containers can conflict with users on host system in Linux 3.15 #6345

Closed
AkeemMcLennon opened this issue Jun 11, 2014 · 40 comments · Fixed by #7179
Closed

Docker containers can conflict with users on host system in Linux 3.15 #6345

AkeemMcLennon opened this issue Jun 11, 2014 · 40 comments · Fixed by #7179

Comments

@AkeemMcLennon
Copy link

Creating a new user in a docker container via the adduser command will cause an error if the user already exists on the host system and the command is used with the --gecos flag to supply finger information. This command is commonly run by package managers to create non-privileged users for daemons (e.g. mysql, postgresql).

Expected result:
A new user is created in docker container regardless of whether or not it already exists in the host system.

Actual Result:
Creating a new user fails with the error

chfn: PAM: System error
adduser: `/usr/bin/chfn -f PostgreSQL administrator postgres' returned error code 1. Exiting.

Steps to Reproduce:

  1. Install the Linux 3.15 kernel on the host machine
  2. Run the following command, replacing "postgres" with any user that exists on the host machine
docker run -i -t ubuntu adduser --system --quiet --home /var/lib/postgresql --no-create-home \
            --shell /bin/bash --group --gecos "PostgreSQL administrator" postgres
@tiagoantao
Copy link

As I started this discussion on the mailing list, I would like to add a few points: I thought this was kernel independent, but @BlueLaguna seems to be correct: this is probably 3.15 related. The reason I thought I got this on 3.13 was because we were in the midst of changing kernels. I now cannot replicate this on 3.13.

In my case the behaviour changed if mysql-server was installed or not in the HOST machine. MySQL server Not installed: no PAM problems. Installed: did not work

It seemed a PAM issue (which chfn complaining).

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 18, 2014

I'm actually can't add any user, not only existing on host machine. I'll retry with 3.14 soon.

@tianon
Copy link
Member

tianon commented Jun 18, 2014

This is definitely an odd one. Does it do the same thing when you use useradd instead of the adduser wrapper?

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 18, 2014

@tianon No, useradd works fine

@AkeemMcLennon
Copy link
Author

I've been looking through the Kernel changelogs and perhaps this commit might be relevant:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=543bc6a1a987672b79d6ebe8e2ab10471d8f1047

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 19, 2014

Seems strange, it works for me on my work machine with 3.15.0-gentoo-r1
Ah, it doesn't now. Have no idea what changed.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 19, 2014

Same on 3.15.1

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 19, 2014

Confirm, that all works fine with 3.14.8

@unclejack
Copy link
Contributor

I've tried to revert this patch https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=543bc6a1a987672b79d6ebe8e2ab10471d8f1047 and it didn't fix the problem.

If you want to try to revert the patch, you'll have to revert it by hand because the code has been changed since this patch was merged and it won't get reverted cleanly.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 20, 2014

There three commits about audit as I see here
Also maybe makes sense to do strace on chfn as @vmarmol suggested to see what syscall spawning an error. I'll look at it at weekend if I have time.

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 20, 2014

Also, to be sure, what graph driver are you guys using? Mine is btrfs.

@frank-dspeed
Copy link

Hello till this is fixed simply use a bash alias to re route the adduser command like this
-RUN DEBIAN_FRONTEND=noninteractive apt-get -y install mysql-server pwgen

+RUN alias adduser='useradd' && DEBIAN_FRONTEND=noninteractive apt-get -y install mysql-server pwgen

@jpetazzo
Copy link
Contributor

Two possible clues:

  • audit, as indicated above;
  • apparmor (or another security module).

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 26, 2014

@jpetazzo I don't use apparmor or something like this.
Someone should try to revert all audit patches :)

@SvenDowideit
Copy link
Contributor

none of these breakages happen on boot2docker 1.1.0 which runs Linux 3.15.3 - in case this (and its config) give you ideas....

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 8, 2014

I'll try 3.15.3 in gentoo soon

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 8, 2014

Still not working with 3.15.3 with my config :(

@undying
Copy link

undying commented Jul 9, 2014

3.15.4 same problem
with 3.14.11 everything works fine.

@fauria
Copy link

fauria commented Jul 17, 2014

I use this to workaround the issue while it gets fixed:

RUN ln -s -f /bin/true /usr/bin/chfn

Looks like anything related with modifying users (chpasswd, passwd, chfd, useradd) raises an error.

@Hoverbear
Copy link

If you're not building a container, you can use something like this on the host machine:

sudo nsenter --target $(docker inspect --format {{.State.Pid}} $YOUR_CONTAINER_ID) --mount --uts --ipc --net --pid /bin/bash

Then you can use su and useradd and other things. But this won't work in a Dockerfile.

@larsks
Copy link
Contributor

larsks commented Jul 22, 2014

The problem I reported in #7123, which may be the same as the problem under discussion here, appears to have been caused by kernel commit 33faba7fa7f2288d2f8aaea95958b2c97bf9ebfb (https://github.com/torvalds/linux/commits/33faba7fa7f2288d2f8aaea95958b2c97bf9ebfb).

From v3.15, running 'git revert 33faba7' and dealing with the resulting conflicts results in a build that operates correctly. I haven't actually taken a close look at what the code does or why it actually fails.

@larsks
Copy link
Contributor

larsks commented Jul 22, 2014

Specifically, this check is failing in kernel/audit.c:

case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
    if (!netlink_capable(skb, CAP_AUDIT_WRITE))
        err = -EPERM;
break;

@AkeemMcLennon
Copy link
Author

@larsks Oh wow, great work! I don't think any of us would've been able to come up with any of this on our own :-) 👍

@larsks
Copy link
Contributor

larsks commented Jul 22, 2014

I've spoken with some of the audit subsystem folks, and the general consensus is that if you want to write audit messages you need to retain CAP_AUDIT_WRITE. The fact that it worked previously was just luck due to insufficient checking in the kernel.

If Docker is explicitly dropping capabilities in containers, the CAP_AUDIT_WRITE capability should be retained.

@larsks
Copy link
Contributor

larsks commented Jul 22, 2014

This is probably obvious, but this bugs means it's not possible to build Docker on anything running a 3.15 kernel, because the build process starts up an Ubuntu container that will try to run things like "chfn" as part of the package install process, which will fail:

chfn: PAM: System error
adduser: `/usr/bin/chfn -f LXC dnsmasq lxc-dnsmasq' returned error code 1. Exiting.

@larsks
Copy link
Contributor

larsks commented Jul 23, 2014

For the curious, I threw together a quit writeup of the diagnostic process I went through to find the root cause of this behavior: http://blog.oddbit.com/2014/07/21/tracking-down-a-kernel-bug-wit/

@thaJeztah
Copy link
Member

@larsks thanks for your effort, it made an interesting read!

@DeX77
Copy link
Contributor

DeX77 commented Jul 28, 2014

I'm sorry to say this, but are you sure that acutally solved the issue?

docker --version 1 ↵
Docker version 1.1.2, build d84a070

docker run -i -t ubuntu:14.04 /bin/bash
root@70b14281a7d4:/# chfn
chfn: PAM: System error

:(

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 28, 2014

@DeX77 Yeah, this is not merged to 1.1.2

@DeX77
Copy link
Contributor

DeX77 commented Jul 28, 2014

@LK4D4 thx, then I'll backport that patch for now.

homme pushed a commit to geo-data/gdal-docker that referenced this issue Jul 31, 2014
The docker index is affected by
<moby/moby#6345 (comment)>.
This fix is an attempt at at temporary workaround and should be
reverted once the docker version behind the registry is updated at the
next release.
@slmingol
Copy link

The patch mentioned above to the Dockerfile worked around the issue for me on Fedora 20 w/ kernel 3.15.6-200.fc20.x86_64.

@larsks
Copy link
Contributor

larsks commented Aug 1, 2014

@slmingol , docker-io-1.0.0-9.fc20.x86_64, available in Fedora 20 updates, includes the fix for this problem. If you run yum update you won't need any workarounds.

@alex-sherwin
Copy link

This is occurring to me while Docker Hub builds my Dockerfile... Works fine on boot2docker 1.1.2

https://registry.hub.docker.com/u/asherwin/docker-rabbitmq/build_id/11714/code/bhnqgt5adexfcsrey8kq8fg/

@lusid
Copy link

lusid commented Aug 3, 2014

We are experiencing the same issue... works fine on boot2docker:

https://registry.hub.docker.com/u/smartprocure/redis/build_id/11708/code/bkaljywukpqaiigz7n7hlzp/

@zonorti
Copy link

zonorti commented Aug 4, 2014

I've tried stracing su in normal and host-network modes, and it looks like NETLINK answer causes su to fail after:
Both running strace -s 256 -f su -c /bin/true test
Fails (host-network):

socket(PF_NETLINK, SOCK_RAW, 9)         = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
readlink("/proc/self/exe", "/usr/bin/su", 4096) = 11
sendto(3, "t\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentication acct=\"test\" exe=\"/usr/bin/su\" hostname=? addr=? terminal=console res=success\0\0", 116, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 116
poll([{fd=3, events=POLLIN}], 1, 500)   = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\210\0\0\0\2\0\0\0\1\0\0\0>\0\0\0\377\377\377\377t\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentication acct=\"test\" exe=\"/usr/bin/su\" hostname=? addr=? terminal=console res=success\0\0", 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 136
recvfrom(3, "\210\0\0\0\2\0\0\0\1\0\0\0>\0\0\0\377\377\377\377t\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentication acct=\"test\" exe=\"/usr/bin/su\" hostname=? addr=? terminal=console res=success\0\0", 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 136

Works(network namespaces):

socket(PF_NETLINK, SOCK_RAW, 9)         = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
readlink("/proc/self/exe", "/usr/bin/su", 4096) = 11
sendto(3, "t\0\0\0L\4\5\0\1\0\0\0\0\0\0\0op=PAM:authentication acct=\"test\" exe=\"/usr/bin/su\" hostname=? addr=? terminal=console res=success\0\0", 116, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0

Host ubuntu 14.04 Linux dockerbff6e0144 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Docker - latest from git (d48492a)

image: centos:latest

Strace logs https://gist.github.com/iMelnik/0ddf4865dd1c490f2e69

@akanto
Copy link

akanto commented Aug 8, 2014

We have also experienced the same issue. As visible from strace output, the error is is caused by the audit calls from PAM. Our workaround is to rebuild the libpam on guest OS with --disable-audit flag.

The patched CentOS 6.5 and Ubuntu 14.04 unofficial images are available in Docker registry https://registry.hub.docker.com/u/sequenceiq/pam/ and on GitHub https://github.com/sequenceiq/docker-pam .

@moby moby locked and limited conversation to collaborators Aug 8, 2014
sqawasmi added a commit to sqawasmi/odoo-docker that referenced this issue Aug 9, 2014
pilwon added a commit to dockerfile/mariadb that referenced this issue Aug 13, 2014
md5 added a commit to synctree/docker-coturn that referenced this issue Aug 15, 2014
aaw added a commit to aaw/docker-postgresql that referenced this issue Jan 21, 2015
sashkachan pushed a commit to sashkachan/docker-wordpress that referenced this issue Jan 29, 2015
fancyremarker pushed a commit to fancyremarker/docker-memcached that referenced this issue Mar 18, 2015
ianyamey pushed a commit to policygenius/docker-sentry that referenced this issue Apr 19, 2015
ianyamey pushed a commit to policygenius/docker-sentry that referenced this issue Apr 19, 2015
fancyremarker pushed a commit to aptible/docker-sentry that referenced this issue Apr 20, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.