-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootfs: umount all procfs and sysfs with --no-pivot #1962
rootfs: umount all procfs and sysfs with --no-pivot #1962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (IANAM)
libcontainer/rootfs_linux.go
Outdated
return err | ||
} | ||
if err := unix.Unmount(p, unix.MNT_DETACH); err != nil { | ||
if err.(syscall.Errno) != unix.EINVAL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially we can also get EPERM
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, but I never saw that error message, when trying to umount /proc
or /sys
I got only EINVAL
. I can amend the patch if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went forward and amended the change in the updated version
5d8596f
to
c18aa18
Compare
When creating a new user namespace, the kernel doesn't allow to mount a new procfs or sysfs file system if there is not already one instance fully visible in the current mount namespace. When using --no-pivot we were effectively inhibiting this protection from the kernel, as /proc and /sys from the host are still present in the container mount namespace. A container without full access to /proc could then create a new user namespace, and from there able to mount a fully visible /proc, bypassing the limitations in the container. A simple reproducer for this issue is: unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger" Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
c18aa18
to
28a697c
Compare
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Including critical security fix for `runc run --no-pivot` (unlikely to affect BuildKit): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 3aec9e7)
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 1ee33f4)
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> (cherry picked from commit 3aec9e7) Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@69663f0...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Changes: opencontainers/runc@96ec217...12f6a99 Including critical security fix for `runc run --no-pivot` (`DOCKER_RAMDISK=1`): opencontainers/runc#1962 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
return err | ||
} | ||
|
||
absRootfs, err := filepath.Abs(rootfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this is not needed since rootfs is already validated by (*ConfigValidator).rootfs()
} | ||
|
||
for _, info := range mountinfos { | ||
p, err := filepath.Abs(info.Mountpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new user namespace, the kernel doesn't allow to mount
a new procfs or sysfs file system if there is not already one instance
fully visible in the current mount namespace.
When using --no-pivot we were effectively inhibiting this protection
from the kernel, as /proc and /sys from the host are still present in
the container mount namespace.
A container without full access to /proc could then create a new user
namespace, and from there able to mount a fully visible /proc, bypassing
the limitations in the container.
A simple reproducer for this issue is:
unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger"
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com