-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'init-container' fails when /var/lib/flatpak, /var/lib/systemd/coredump or /var/log/journal on the host are mounted with nodev, noexec or nosuid #911
Comments
I think this command is failing inside the container:
Would it be possible for you to play with your set-up and figure out a mount configuration that works? That would greatly help to move this forward. |
/var/log
is a mountpointSometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal sit on security-hardened mount points that are marked as 'nosuid,nodev,noexec' [1]. In such cases, when Toolbx is used rootless, an attempt to bind mount these locations read-only at runtime with mount(8) fails because of permission problems: # mount --rbind -o ro <source> <containerPath> mount: <containerPath>: filesystem was mounted, but any subsequent operation failed: Unknown error 5005. The problem is that 'init-container' is running inside the container's mount and user namespace and the source paths were mounted inside the host's namespace with 'nosuid,nodev,noexec'. The above mount(8) call tries to remove the 'nosuid,nodev,noexec' flags from the mount point and replace them with only 'ro', which is something that can't be done from a child namespace. There's actually no benefit in bind mounting these paths as read-only. It was historically done this way 'just to be safe' because a user isn't expected to write to these locations from inside a container. However, Toolbx doesn't intend to provide any heightened security beyond what's already available on the host. Hence, it's better to get out of the way and leave it to the permissions on the source location from the host operating system to guard the castle. This is accomplished by not passing any file system options to mount(8) [1]. Note that this isn't a problem when Toolbx is running as root, because the container uses the host's user namespace. Based on an idea from Si. [1] https://man7.org/linux/man-pages/man8/mount.8.html containers#911
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal sit on security hardened mount points that are marked as 'nosuid,nodev,noexec' [1]. In such cases, when Toolbx is used rootless, an attempt to bind mount these locations read-only at runtime with mount(8) fails because of permission problems: # mount --rbind -o ro <source> <containerPath> mount: <containerPath>: filesystem was mounted, but any subsequent operation failed: Unknown error 5005. The problem is that 'init-container' is running inside the container's mount and user namespace and the source paths were mounted inside the host's namespace with 'nosuid,nodev,noexec'. The above mount(8) call tries to remove the 'nosuid,nodev,noexec' flags from the mount point and replace them with only 'ro', which is something that can't be done from a child namespace. There's actually no benefit in bind mounting these paths as read-only. It was historically done this way 'just to be safe' because a user isn't expected to write to these locations from inside a container. However, Toolbx doesn't intend to provide any heightened security beyond what's already available on the host. Hence, it's better to get out of the way and leave it to the permissions on the source location from the host operating system to guard the castle. This is accomplished by not passing any file system options to mount(8) [1]. Note that this isn't a problem when Toolbx is running as root, because the container uses the host's user namespace. Based on an idea from Si. [1] https://man7.org/linux/man-pages/man8/mount.8.html containers#911
While playing with this, I realized that in practice one also needs
|
Pull request from earlier today: #1340 Testing appreciated. |
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal sit on security hardened mount points that are marked as 'nosuid,nodev,noexec' [1]. In such cases, when Toolbx is used rootless, an attempt to bind mount these locations read-only at runtime with mount(8) fails because of permission problems: # mount --rbind -o ro <source> <containerPath> mount: <containerPath>: filesystem was mounted, but any subsequent operation failed: Unknown error 5005. The problem is that 'init-container' is running inside the container's mount and user namespace and the source paths were mounted inside the host's namespace with 'nosuid,nodev,noexec'. The above mount(8) call tries to remove the 'nosuid,nodev,noexec' flags from the mount point and replace them with only 'ro', which is something that can't be done from a child namespace. There's actually no benefit in bind mounting these paths as read-only. It was historically done this way 'just to be safe' because a user isn't expected to write to these locations from inside a container. However, Toolbx doesn't intend to provide any heightened security beyond what's already available on the host. Hence, it's better to get out of the way and leave it to the permissions on the source location from the host operating system to guard the castle. This is accomplished by not passing any file system options to mount(8) [1]. Note that this isn't a problem when Toolbx is running as root, because the container uses the host's user namespace. Based on an idea from Si. [1] https://man7.org/linux/man-pages/man8/mount.8.html containers#911
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal sit on security hardened mount points that are marked as 'nosuid,nodev,noexec' [1]. In such cases, when Toolbx is used rootless, an attempt to bind mount these locations read-only at runtime with mount(8) fails because of permission problems: # mount --rbind -o ro <source> <containerPath> mount: <containerPath>: filesystem was mounted, but any subsequent operation failed: Unknown error 5005. (Note that the above error message from mount(8) was subsequently improved to show something more meaningful than 'Unknown error' [2].) The problem is that 'init-container' is running inside the container's mount and user namespace, and the source paths were mounted inside the host's namespace with 'nosuid,nodev,noexec'. The above mount(8) call tries to remove the 'nosuid,nodev,noexec' flags from the mount point and replace them with only 'ro', which is something that can't be done from a child namespace. Note that this doesn't fail when Toolbx is running as root. This is because the container uses the host's user namespace and is able to remove the 'nosuid,nodev,noexec' flags from the mount point and replace them with only 'ro'. Even though it doesn't fail, the flags shouldn't get replaced like that inside the container, because it removes the security hardening of those mount points. There's actually no benefit in bind mounting these paths as read-only. It was historically done this way 'just to be safe' because a user isn't expected to write to these locations from inside a container. However, Toolbx doesn't intend to provide any heightened security beyond what's already available on the host. Hence, it's better to get out of the way and leave it to the permissions on the source location from the host operating system to guard the castle. This is accomplished by not passing any file system options to mount(8) [1]. Based on an idea from Si. [1] https://man7.org/linux/man-pages/man8/mount.8.html [2] util-linux commit 9420ca34dc8b6f0f util-linux/util-linux@9420ca34dc8b6f0f util-linux/util-linux#2376 containers#911
Received some positive feedback downstream: https://bugzilla.redhat.com/show_bug.cgi?id=2144541 |
Thanks for the patience and all the help in getting this fixed. Much appreciated. |
Followup to 1cc9e07 Sometimes the parent location might be mounted with nosuid,nodev,noexec and trying to remount it as ro would remove those and thus fails. See commit mentioned above for more details. containers#911
Followup to 1cc9e07 Sometimes the parent location might be mounted with nosuid,nodev,noexec and trying to remount it as ro would remove those and thus fails. See commit mentioned above for more details. containers#911 Signed-off-by: Jordan Petridis <jordan@centricular.com>
On new builds of GNOME OS [1], the host's / is mounted with 'nodev,...' and those flags are also inherited by /etc because it's not a separate mount point. This leads to the same problem with /etc/machine-id that was seen before with /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal [2]. Therefore, use the same approach [2] to handle /etc/machine-id. [1] https://gitlab.gnome.org/GNOME/gnome-build-meta/-/issues/718 [2] Commit 1cc9e07 containers@1cc9e07b7c36fe9f containers#1340 containers#911 Signed-off-by: Jordan Petridis <jordan@centricular.com>
On new builds of GNOME OS [1], the host's / is mounted with 'nodev,...' and those flags are also inherited by /etc because it's not a separate mount point. This leads to the same problem with /etc/machine-id that was seen before with /var/lib/flatpak, /var/lib/systemd/coredump and /var/log/journal [2]. Therefore, use the same approach [2] to handle /etc/machine-id. [1] https://gitlab.gnome.org/GNOME/gnome-build-meta/-/issues/718 [2] Commit 1cc9e07 containers@1cc9e07b7c36fe9f containers#1340 containers#911 containers#1354 Signed-off-by: Jordan Petridis <jordan@centricular.com>
Describe the bug
toolbox enter
fails when/var/log
is a mounted btrfs volume. Of note is that unmounting, then remounting,/var/log
does not prevent an already started container from being entered. Having /var/log mounted as a tmpfs seems to work just fine, however.Steps how to reproduce the behaviour
Mount a btrfs volume on
/var/log
and try to start a container.Expected behaviour
Toolboxes run just fine
Actual behaviour
toolbox enter
fails with a cryptic error message about an invalid entry point PID.Output of
toolbox --version
(v0.0.90+)toolbox version 0.0.99.2
Toolbox package info (
rpm -q toolbox
)toolbox-0.0.99.2^3.git075b9a8d2779-4.fc35.x86_64
Output of
podman version
Podman package info (
rpm -q podman
)podman-3.4.1-1.fc35.x86_64
Info about your OS
Fedora Silverblue 35; recently upgraded from F34, if that matters
Additional context
I am 80% sure this worked at one point, and that the Silverblue wiki did not mention
/var/log
as a forbidden mount.Attached below. The relevant part is:
tb-enter-dev.txt
podman-start.txt
The relevant part of my fstab is:
/dev/sda4 /var/log btrfs subvol=@varlog,compress=zstd:1,nosuid,nodev 0 0
The text was updated successfully, but these errors were encountered: