Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'init-container' fails when /var/lib/flatpak, /var/lib/systemd/coredump or /var/log/journal on the host are mounted with nodev, noexec or nosuid #911

Closed
AbsolutelyLudicrous opened this issue Nov 2, 2021 · 5 comments
Assignees
Labels
1. Bug Something isn't working

Comments

@AbsolutelyLudicrous
Copy link

Describe the bug
toolbox enter fails when /var/log is a mounted btrfs volume. Of note is that unmounting, then remounting, /var/log does not prevent an already started container from being entered. Having /var/log mounted as a tmpfs seems to work just fine, however.

Steps how to reproduce the behaviour
Mount a btrfs volume on /var/log and try to start a container.

Expected behaviour
Toolboxes run just fine

Actual behaviour
toolbox enter fails with a cryptic error message about an invalid entry point PID.

Output of toolbox --version (v0.0.90+)
toolbox version 0.0.99.2

Toolbox package info (rpm -q toolbox)
toolbox-0.0.99.2^3.git075b9a8d2779-4.fc35.x86_64

Output of podman version

Version:      3.4.1
API Version:  3.4.1
Go Version:   go1.16.8
Built:        Wed Oct 20 10:31:56 2021
OS/Arch:      linux/amd64

Podman package info (rpm -q podman)
podman-3.4.1-1.fc35.x86_64

Info about your OS
Fedora Silverblue 35; recently upgraded from F34, if that matters

Additional context
I am 80% sure this worked at one point, and that the Silverblue wiki did not mention /var/log as a forbidden mount.

If you see an error message saying: Error: invalid entry point PID of container <name-of-container>, add to the ticket output of command podman start --attach <name-of-container>.

Attached below. The relevant part is:

level=debug msg="Creating directory /var/log/journal"
level=debug msg="Binding /var/log/journal to /run/host/var/log/journal"
mount: /var/log/journal: filesystem was mounted, but any subsequent operation failed: Unknown error 5005.
Error: failed to bind /var/log/journal to /run/host/var/log/journal

tb-enter-dev.txt

podman-start.txt

The relevant part of my fstab is:

/dev/sda4 /var/log btrfs subvol=@varlog,compress=zstd:1,nosuid,nodev 0 0

@AbsolutelyLudicrous AbsolutelyLudicrous added the 1. Bug Something isn't working label Nov 2, 2021
@debarshiray
Copy link
Member

I think this command is failing inside the container:

# mount --rbind -o ro /run/host/var/log/journal /var/log/journal

/run/host inside the container is an outcome of:

$ podman create ... --volume /:/run/host:rslave ...

Would it be possible for you to play with your set-up and figure out a mount configuration that works? That would greatly help to move this forward.

@debarshiray debarshiray self-assigned this Jul 12, 2023
@debarshiray debarshiray changed the title Toolboxes fail to start when /var/log is a mountpoint 'init-container' fails when /var/lib/flatpak, /var/lib/systemd/coredump or /var/log/journal on the host are mounted with nodev, noexec or nosuid Jul 12, 2023
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 13, 2023
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump
and /var/log/journal sit on security-hardened mount points that are
marked as 'nosuid,nodev,noexec' [1].  In such cases, when Toolbx is used
rootless, an attempt to bind mount these locations read-only at runtime
with mount(8) fails because of permission problems:
  # mount --rbind -o ro <source> <containerPath>
  mount: <containerPath>: filesystem was mounted, but any subsequent
      operation failed: Unknown error 5005.

The problem is that 'init-container' is running inside the container's
mount and user namespace and the source paths were mounted inside the
host's namespace with 'nosuid,nodev,noexec'.  The above mount(8) call
tries to remove the 'nosuid,nodev,noexec' flags from the mount point and
replace them with only 'ro', which is something that can't be done from
a child namespace.

There's actually no benefit in bind mounting these paths as read-only.
It was historically done this way 'just to be safe' because a user isn't
expected to write to these locations from inside a container.  However,
Toolbx doesn't intend to provide any heightened security beyond what's
already available on the host.

Hence, it's better to get out of the way and leave it to the permissions
on the source location from the host operating system to guard the
castle.  This is accomplished by not passing any file system options to
mount(8) [1].

Note that this isn't a problem when Toolbx is running as root, because
the container uses the host's user namespace.

Based on an idea from Si.

[1] https://man7.org/linux/man-pages/man8/mount.8.html

containers#911
debarshiray added a commit to debarshiray/toolbox that referenced this issue Jul 13, 2023
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump
and /var/log/journal sit on security hardened mount points that are
marked as 'nosuid,nodev,noexec' [1].  In such cases, when Toolbx is used
rootless, an attempt to bind mount these locations read-only at runtime
with mount(8) fails because of permission problems:
  # mount --rbind -o ro <source> <containerPath>
  mount: <containerPath>: filesystem was mounted, but any subsequent
      operation failed: Unknown error 5005.

The problem is that 'init-container' is running inside the container's
mount and user namespace and the source paths were mounted inside the
host's namespace with 'nosuid,nodev,noexec'.  The above mount(8) call
tries to remove the 'nosuid,nodev,noexec' flags from the mount point and
replace them with only 'ro', which is something that can't be done from
a child namespace.

There's actually no benefit in bind mounting these paths as read-only.
It was historically done this way 'just to be safe' because a user isn't
expected to write to these locations from inside a container.  However,
Toolbx doesn't intend to provide any heightened security beyond what's
already available on the host.

Hence, it's better to get out of the way and leave it to the permissions
on the source location from the host operating system to guard the
castle.  This is accomplished by not passing any file system options to
mount(8) [1].

Note that this isn't a problem when Toolbx is running as root, because
the container uses the host's user namespace.

Based on an idea from Si.

[1] https://man7.org/linux/man-pages/man8/mount.8.html

containers#911
@debarshiray
Copy link
Member

I think this command is failing inside the container:

# mount --rbind -o ro /run/host/var/log/journal /var/log/journal

/run/host inside the container is an outcome of:

$ podman create ... --volume /:/run/host:rslave ...

While playing with this, I realized that in practice one also needs --privileged:

$ podman create ... --privileged --volume /:/run/host:rslave ...

@debarshiray
Copy link
Member

Pull request from earlier today: #1340

Testing appreciated.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 11, 2023
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump
and /var/log/journal sit on security hardened mount points that are
marked as 'nosuid,nodev,noexec' [1].  In such cases, when Toolbx is used
rootless, an attempt to bind mount these locations read-only at runtime
with mount(8) fails because of permission problems:
  # mount --rbind -o ro <source> <containerPath>
  mount: <containerPath>: filesystem was mounted, but any subsequent
      operation failed: Unknown error 5005.

The problem is that 'init-container' is running inside the container's
mount and user namespace and the source paths were mounted inside the
host's namespace with 'nosuid,nodev,noexec'.  The above mount(8) call
tries to remove the 'nosuid,nodev,noexec' flags from the mount point and
replace them with only 'ro', which is something that can't be done from
a child namespace.

There's actually no benefit in bind mounting these paths as read-only.
It was historically done this way 'just to be safe' because a user isn't
expected to write to these locations from inside a container.  However,
Toolbx doesn't intend to provide any heightened security beyond what's
already available on the host.

Hence, it's better to get out of the way and leave it to the permissions
on the source location from the host operating system to guard the
castle.  This is accomplished by not passing any file system options to
mount(8) [1].

Note that this isn't a problem when Toolbx is running as root, because
the container uses the host's user namespace.

Based on an idea from Si.

[1] https://man7.org/linux/man-pages/man8/mount.8.html

containers#911
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 11, 2023
Sometimes locations such as /var/lib/flatpak, /var/lib/systemd/coredump
and /var/log/journal sit on security hardened mount points that are
marked as 'nosuid,nodev,noexec' [1].  In such cases, when Toolbx is used
rootless, an attempt to bind mount these locations read-only at runtime
with mount(8) fails because of permission problems:
  # mount --rbind -o ro <source> <containerPath>
  mount: <containerPath>: filesystem was mounted, but any subsequent
      operation failed: Unknown error 5005.

(Note that the above error message from mount(8) was subsequently
improved to show something more meaningful than 'Unknown error' [2].)

The problem is that 'init-container' is running inside the container's
mount and user namespace, and the source paths were mounted inside the
host's namespace with 'nosuid,nodev,noexec'.  The above mount(8) call
tries to remove the 'nosuid,nodev,noexec' flags from the mount point and
replace them with only 'ro', which is something that can't be done from
a child namespace.

Note that this doesn't fail when Toolbx is running as root.  This is
because the container uses the host's user namespace and is able to
remove the 'nosuid,nodev,noexec' flags from the mount point and replace
them with only 'ro'.  Even though it doesn't fail, the flags shouldn't
get replaced like that inside the container, because it removes the
security hardening of those mount points.

There's actually no benefit in bind mounting these paths as read-only.
It was historically done this way 'just to be safe' because a user isn't
expected to write to these locations from inside a container.  However,
Toolbx doesn't intend to provide any heightened security beyond what's
already available on the host.

Hence, it's better to get out of the way and leave it to the permissions
on the source location from the host operating system to guard the
castle.  This is accomplished by not passing any file system options to
mount(8) [1].

Based on an idea from Si.

[1] https://man7.org/linux/man-pages/man8/mount.8.html

[2] util-linux commit 9420ca34dc8b6f0f
    util-linux/util-linux@9420ca34dc8b6f0f
    util-linux/util-linux#2376

containers#911
@debarshiray
Copy link
Member

debarshiray commented Aug 11, 2023

Pull request from earlier today: #1340

Testing appreciated.

Received some positive feedback downstream: https://bugzilla.redhat.com/show_bug.cgi?id=2144541

@debarshiray
Copy link
Member

Thanks for the patience and all the help in getting this fixed. Much appreciated.

alatiera added a commit to alatiera/toolbox that referenced this issue Aug 19, 2023
Followup to 1cc9e07

Sometimes the parent location might be mounted with
nosuid,nodev,noexec and trying to remount it as ro would remove
those and thus fails.

See commit mentioned above for more details.

containers#911
alatiera added a commit to alatiera/toolbox that referenced this issue Aug 20, 2023
Followup to 1cc9e07

Sometimes the parent location might be mounted with
nosuid,nodev,noexec and trying to remount it as ro would remove
those and thus fails.

See commit mentioned above for more details.

containers#911

Signed-off-by: Jordan Petridis <jordan@centricular.com>
debarshiray pushed a commit to alatiera/toolbox that referenced this issue Aug 22, 2023
On new builds of GNOME OS [1], the host's / is mounted with 'nodev,...'
and those flags are also inherited by /etc because it's not a separate
mount point.  This leads to the same problem with /etc/machine-id that
was seen before with /var/lib/flatpak, /var/lib/systemd/coredump and
/var/log/journal [2].

Therefore, use the same approach [2] to handle /etc/machine-id.

[1] https://gitlab.gnome.org/GNOME/gnome-build-meta/-/issues/718

[2] Commit 1cc9e07
    containers@1cc9e07b7c36fe9f
    containers#1340

containers#911

Signed-off-by: Jordan Petridis <jordan@centricular.com>
debarshiray pushed a commit to alatiera/toolbox that referenced this issue Aug 22, 2023
On new builds of GNOME OS [1], the host's / is mounted with 'nodev,...'
and those flags are also inherited by /etc because it's not a separate
mount point.  This leads to the same problem with /etc/machine-id that
was seen before with /var/lib/flatpak, /var/lib/systemd/coredump and
/var/log/journal [2].

Therefore, use the same approach [2] to handle /etc/machine-id.

[1] https://gitlab.gnome.org/GNOME/gnome-build-meta/-/issues/718

[2] Commit 1cc9e07
    containers@1cc9e07b7c36fe9f
    containers#1340

containers#911
containers#1354

Signed-off-by: Jordan Petridis <jordan@centricular.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants