-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operation not permitted when mounting /proc to /tmp/proc #10944
Comments
I believe this is the same issue being described by @terenceli in this blog post. @avagin helped with diagnosing this. See issue #8205 and opencontainers/runc#1658. That said I don't understand why this would be release-related, as I don't think something changed in the startup process that would change this behavior... |
@avagin bisected this to commit cc1f550, specifically in - if pidns {
- flags := uint32(unix.MS_NOSUID | unix.MS_NODEV | unix.MS_NOEXEC | unix.MS_RDONLY)
- if err := mountInChroot(chroot, "proc", "/proc", "proc", flags); err != nil {
- return fmt.Errorf("error mounting proc in chroot: %v", err)
- }
- } else {
- if err := mountInChroot(chroot, "/proc", "/proc", "bind", unix.MS_BIND|unix.MS_RDONLY|unix.MS_REC); err != nil {
- return fmt.Errorf("error mounting proc in chroot: %v", err)
- }
+ flags := uint32(unix.MS_NOSUID | unix.MS_NODEV | unix.MS_NOEXEC | unix.MS_RDONLY)
+ if err := mountInChroot(chroot, "proc", "/proc", "proc", flags); err != nil {
+ return fmt.Errorf("error mounting proc in chroot: %v", err)
} This means that the chroot used to use a recursive read-only bind mount of |
I will add a test to |
That was only the case for ptrace & systrap platform (the pidns variable would be false for these). For other platforms, it was still doing what it was doing today. cc1f550 just got rid of the ptrace/systrap special case. Systrap and ptrace were running in the caller's pidns. We still want to execute the sandbox process in a new pidns, so can't bind mount the procfs mount because it is presenting data for the parent pidns. |
What fix do you suggest? |
@ayushr2 the sandbox process access only generic files and /proc/self, so we actually don't need proc from the target pid namespace. |
@avagin If you think bind mounting the proc mount is OK, I would defer to you. It seems a bit of a foot gun to do this to me, as it can lead to surprises if the sandbox tries to do any
It is important to run the sandbox in different pidns so the sandbox can not impact host pidns with fork bombs and such calamities. I don't have a different fix in mind yet unfortunately... |
Ok, I guess I get what's going on with overmounting and how it affects
So, the underlying issue is that when the inner container attempts to mount I went down the rabbit hole a bit and found out the following:
So, could gVisor perhaps mount |
When runsc is started, it reads a few top level files such as /proc/cpuinfo, /proc/sys/vm/mmap_min_addr, /proc/self/auxv, /proc/sys/kernel/cap_last_cap. |
Could we mount |
subset=pid doesn't help to avoid this issue. The kernel still does the same check and doesn't allow us to create a new proc instance:
|
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
We can do something like 231c152
|
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 681229280
This is used in contexts such as Dangerzone: https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/ Updates issue #10944. PiperOrigin-RevId: 682454284
Thanks for looking into it @avagin. I tried your commands (#10944 (comment)) and indeed they failed with "Mount too revealing". Weird... For what is worth, your workaround looks fine to me. |
Hey folks. Just checking if this issue will be worked on the next releases. We'll release a new Dangerzone version in a few days and it would be nice to offer the latest gVisor version. Currently, we are pinned to version |
I will be picking up @avagin's patch and submitting it soon. |
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
…mount. As part of sandbox startup, `runsc` needs to set up a chroot environment with a minimal working `procfs` filesystem mounted within. However, doing so from within a container (as applications like Dangerzone do) may fail, because in the container runtime's default configuration, some paths of the procfs filesystem visible from within the container may be obstructed. This prevents mounting new unobstructed instances of `procfs`. This change detects this case and falls back to the previous behavior of using a recursive bind-mount of `/proc` in such a case. The obstructed subdirectories of procfs are preserved in this case, which is fine because we only need a very minimal subset of `procfs` to actually work. Additionally, `runsc` actually only needs a few kernel parameter files and `/proc/self` in order to work. So this change sets up a `tmpfs` mount that contains just those files, with the kernel parameter files being plainly copied and `/proc/self` being a symlink to the one present in the mounted view of `procfs` (regardless of which mounting method was used). The `runtime_in_docker` test will continuously verify that this fallback mechanism works to avoid similar breakage in the future. Credits to @avagin for figuring out this solution. Fixes #10944. PiperOrigin-RevId: 691672104
Should be fixed with 6adc072. |
Description
When running a Dangerzone container image with the latest gVisor release (release-20240916.0), we stumble onto the following error:
Building the container image with the previous release (release-20240826.0) works. Running the outer container with
--privileged
also works, but not withCAP_SYS_ADMIN
.(reminder, in the Dangerzone project, gVisor runs nested within a Docker/Podman container. I can verify the error is the same regardless of the container runtime, Linux kernel, enforced capabilities)
Steps to reproduce
Unfortunately, I don't have a minimum reproducible example for this. The way we have reproduced it for now is:
BUILD.md
poetry run ./dev/dangerzone-cli tests/test_docs/sample-pdf.pdf
.runsc version
docker version (if using docker)
No response
uname
Linux 88387f6d4d93 6.5.11-linuxkit #1 SMP PREEMPT_DYNAMIC Wed Dec 6 17:14:50 UTC 2023 x86_64 Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered: