-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc with --no-pivot retains global mount context #1602
Comments
When you use Out of interest, is there a reason you cannot use |
Also I'd be interested to see if the mount shows up in |
@cyphar Did the steps to reproduce the issue not work for you? Is there something I can do to clarify them for you? /proc/self refers to the PID of my shell, the output of /proc/self/mountinfo and /proc/self/mounts is identical between my shell's view and runc's view. Both runc and my shell exist in the main mount namespace.
/proc/self/mountinfo with runc running and the FS mounted.
And /proc/self/mounts
And after unmount with the
However, the mount being removed from my namespace is expected, it still exists within the namespace that is used to execute the While it may be using chroot, something more is happening here, the chroot is being executed inside a mount namespace. This is the mount namespace of pause running inside only a chroot.
And /proc/$PAUSE/mounts and mountinfo are both empty, even though the pause command shares the mount namespace with the main system.
I don't know how much more the chroot command does compared to chroot(), however seeing the mounts/mountinfo from a process running directly under the chroot command seems to suggest it does something to obscure the mounts in a namespace, which would explain why I cannot see the mounted filesystem via looking at /proc/$PID/mounts, even though it is certainly being kept mounted inside that mount namespace. While I do not know the reason for this system using --no-pivot, I will speculate this was done as the container filesystems reside on a read-only filesystem. While I am working on moving these system to use pivot-root, I don't think that is an appropriate resolution to this bug. If --no-pivot is a supported configuration with runc, and when being run with --no-pivot the user is incapable of cleanly unmounting a filesystem on a disk, the damage could be rather severe. If I hit unmount in a UI or in the CLI and the unmount command succeeds with return code 0, I'm going to yank the drive out of a hot-swap bay or out of a USB port, and I'm unlikely to notice that something's not right until I get prompted to do a repair, or have data corruption. There is no easy way for a user to detect that the filesystem is still mounted in the system with this bug present. The only reason I noticed this was the message that the kernel partition table could not be updated when trying to repartition/reformat the drive as the kernel refused to update the partition map. If I had not noticed the message, and had not changed the number of partitions it would have been extremely easy to reformat the "new" partitions and only notice that there was a problem after a reboot, which with servers could be costly as reboots can be months apart, or a laptop which a reboot could land in a very inopportune time when traveling. As well, when booting a system, and getting a message to run xfs_repair, I should be able to unmount the filesystem without having to reboot into a single-user mode or shut down a bunch of unrelated services running on the system to allow the filesystem to unmount and xfs_repair to be able to run. As well, the error messages are thoroughly confusing to a user when they encounter this issue, the user is told the filesystem is mounted, when from the users view it is not, and this is very confusing to start debugging. |
chroot is done in a new mount namespace, because otherwise you cannot set up a container's rootfs.
I agree, I'm just wondering why we still have this as a supported configuration. The reason why I asked why you couldn't use I will try to reproduce the problem you describe, I was asking introductory questions to make sure that we don't hit an XY problem. I will cross-check a "normal" chroot to what runc does and double check where the problem lies. |
Fair enough, I was just worried my reproduction steps were not detailed enough. With that in mind we sought out the commit that fixed the issue, and found that it was fixed with commit 4301b44 I am not the person who originally configured the systems this issue was found on, I just had to debug the issue. I'm fairly certain it was only done for the read only FS. And even if that was not fixed in 1.0.0-rc4, I would be mounting an overlayfs over the containers as a way to rectify this issue. Sorry for the noise, I should have checked against master first. I'll close the issue, the only thing I can think of doing here now is to maybe write a test-case so the issue does not inadvertently get re-introduced. |
@nakato Ah, so it was caused by mount propagation (which was going to be my guess). Effectively the reason why this occurs is because Maybe we should give a warning if you're doing That patch just changes the default, you can still trigger it by setting I'm going to reopen this bug, if you don't mind. |
When a runc container is started with
--no-pivot
, the mount namespace of the process running under runc retains some form of the mount namespace, resulting in the inability to unmount filesystems mounted before runc started from the system. This happens regardless of the unmounted filesystem not appearing in/proc/$PID/mounts
Expected
Filesystem not referenced by runc container to be unmounted when attempted to be unmounted from the system.
Actual
Filesystem remains mounted until runc containers are stopped.
Reproduction
Reproducible: Always
Runc version: v1.0.0-rc4
Kernel: 4.12.10-1-ARCH
This issue was originally noticed with physical devices, on Ubuntu Xenial with runc 1.0.0-rc2, but the following reproduction is on the above, as listed.
Prepare loopback mount
Prepare and run container
Unmount filesystem
No output, looks unmounted.
Kernel threads are still here, it's still mounted.
Stop the container
I hit Ctrl+C on the container window to stop the container. Runc and pause have exited.
No output, filesystem is now unmounted.
Some of the repercussions of this issue is the inability to unmount and xfs_repair a physical disk that was mounted before runc started a namespaced process. This means having to unmount the fs, stop all runc containers, then start all runc processes back up, or remove the filesystem from being mounted on boot and rebooting the system.
I was also able to reproduce this behavior with ext4. The kernel thread for ext4 is
jbd2/loop
This issue cannot be reproduced if pivot occurs.
The text was updated successfully, but these errors were encountered: