Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation not permitted when mounting /proc to /tmp/proc #10944

Closed
apyrgio opened this issue Sep 23, 2024 · 16 comments · Fixed by #11097
Closed

Operation not permitted when mounting /proc to /tmp/proc #10944

apyrgio opened this issue Sep 23, 2024 · 16 comments · Fixed by #11097
Labels
type: bug Something isn't working

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Sep 23, 2024

Description

When running a Dangerzone container image with the latest gVisor release (release-20240916.0), we stumble onto the following error:

W0923 13:05:11.358402       1 boot.go:266] Not setting product_name: open /sys/devices/virtual/dmi/id/product_name: no such file or directory                                                                                                 I0923 13:05:11.358522       1 boot.go:279] Setting host-shmem-huge: "never"                                                                                                                                                                   W0923 13:05:11.359241       1 specutils.go:129] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.                                                                                                                     
I0923 13:05:11.359297       1 chroot.go:92] Setting up sandbox chroot in "/tmp"                                                                                                                                                               
W0923 13:05:11.359386       1 chroot.go:109] Failed to copy /etc/localtime: open /etc/localtime: no such file or directory. UTC timezone will be used.                                                                                        
I0923 13:05:11.359425       1 chroot.go:37] Mounting "proc" at "/tmp/proc"                                                                                                                                                                    
W0923 13:05:11.359477       1 util.go:64] FATAL ERROR: error setting up chroot: error mounting proc in chroot: error mounting "proc" at "/tmp/proc": mount("proc", "/tmp/proc", 15) failed: operation not permitted                           
error setting up chroot: error mounting proc in chroot: error mounting "proc" at "/tmp/proc": mount("proc", "/tmp/proc", 15) failed: operation not permitted  

Building the container image with the previous release (release-20240826.0) works. Running the outer container with --privileged also works, but not with CAP_SYS_ADMIN.

(reminder, in the Dangerzone project, gVisor runs nested within a Docker/Podman container. I can verify the error is the same regardless of the container runtime, Linux kernel, enforced capabilities)

Steps to reproduce

Unfortunately, I don't have a minimum reproducible example for this. The way we have reproduced it for now is:

  1. Download the Dangerzone source.
  2. Build the project according to your operating system. Check out BUILD.md
  3. Attempt to run Dangerzone with poetry run ./dev/dangerzone-cli tests/test_docs/sample-pdf.pdf.
  4. Boom.

runsc version

runsc version release-20240916.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux 88387f6d4d93 6.5.11-linuxkit #1 SMP PREEMPT_DYNAMIC Wed Dec 6 17:14:50 UTC 2023 x86_64 Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

Invoked with command: /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels
Command inside gVisor sandbox: ['/usr/bin/python3', '-m', 'dangerzone.conversion.doc_to_pixels']
OCI config:
{
  "hostname": "dangerzone",
  "linux": {
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      }
    ]
  },
  "mounts": [
    {
      "destination": "/proc",
      "source": "proc",
      "type": "proc"
    },
    {
      "destination": "/dev",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ],
      "source": "tmpfs",
      "type": "tmpfs"
    },
    {
      "destination": "/sys",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ],
      "source": "tmpfs",
      "type": "tmpfs"
    },
    {
      "destination": "/tmp",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ],
      "source": "tmpfs",
      "type": "tmpfs"
    },
    {
      "destination": "/home/dangerzone",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ],
      "source": "tmpfs",
      "type": "tmpfs"
    },
    {
      "destination": "/usr/lib/libreoffice/share/extensions/",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ],
      "source": "tmpfs",
      "type": "tmpfs"
    }
  ],
  "ociVersion": "1.0.0",
  "process": {
    "args": [
      "/usr/bin/python3",
      "-m",
      "dangerzone.conversion.doc_to_pixels"
    ],
    "capabilities": {
      "bounding": [],
      "effective": [],
      "inheritable": [],
      "permitted": []
    },
    "cwd": "/",
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "PYTHONPATH=/opt/dangerzone",
      "TERM=xterm"
    ],
    "rlimits": [
      {
        "hard": 4096,
        "soft": 4096,
        "type": "RLIMIT_NOFILE"
      }
    ],
    "user": {
      "gid": 1000,
      "uid": 1000
    }
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  }
}
Running gVisor with command line: /usr/bin/runsc --rootless=true --network=none --root=/home/dangerzone/.containers --debug=true --alsologtostderr=true run --bundle=/home/dangerzone/dangerzone-image dangerzone
I0923 13:05:11.308786       7 main.go:196] **************** gVisor ****************
I0923 13:05:11.308851       7 main.go:197] Version release-20240916.0, go1.22.0 X:nocoverageredesign, amd64, 6 CPUs, linux, PID 7, PPID 1, UID 1000, GID 1000
D0923 13:05:11.308862       7 main.go:198] Page size: 0x1000 (4096 bytes)
I0923 13:05:11.308872       7 main.go:199] Args: [/usr/bin/runsc --rootless=true --network=none --root=/home/dangerzone/.containers --debug=true --alsologtostderr=true run --bundle=/home/dangerzone/dangerzone-image dangerzone]
I0923 13:05:11.308895       7 config.go:416] Platform: systrap
I0923 13:05:11.308913       7 config.go:417] RootDir: /home/dangerzone/.containers
I0923 13:05:11.308918       7 config.go:418] FileAccess: exclusive / Directfs: true / Overlay: root:self
I0923 13:05:11.308926       7 config.go:419] Network: none
I0923 13:05:11.308933       7 config.go:421] Debug: true. Strace: false, max size: 1024, syscalls:
D0923 13:05:11.308939       7 config.go:439] Config.RootDir (--root): /home/dangerzone/.containers
D0923 13:05:11.308948       7 config.go:439] Config.Traceback (--traceback): system
D0923 13:05:11.308953       7 config.go:439] Config.Debug (--debug): true
D0923 13:05:11.308958       7 config.go:439] Config.LogFilename (--log): (empty)
D0923 13:05:11.308980       7 config.go:439] Config.LogFormat (--log-format): text
D0923 13:05:11.308985       7 config.go:439] Config.DebugLog (--debug-log): (empty)
D0923 13:05:11.308989       7 config.go:439] Config.DebugToUserLog (--debug-to-user-log): false
D0923 13:05:11.308994       7 config.go:439] Config.DebugCommand (--debug-command): (empty)
D0923 13:05:11.308999       7 config.go:439] Config.PanicLog (--panic-log): (empty)
D0923 13:05:11.309004       7 config.go:439] Config.CoverageReport (--coverage-report): (empty)
D0923 13:05:11.309022       7 config.go:439] Config.DebugLogFormat (--debug-log-format): text
D0923 13:05:11.309028       7 config.go:439] Config.FileAccess (--file-access): exclusive
D0923 13:05:11.309040       7 config.go:439] Config.FileAccessMounts (--file-access-mounts): shared
D0923 13:05:11.309045       7 config.go:439] Config.Overlay (--overlay): false
D0923 13:05:11.309050       7 config.go:439] Config.Overlay2 (--overlay2): root:self
D0923 13:05:11.309054       7 config.go:439] Config.FSGoferHostUDS (--fsgofer-host-uds): false
D0923 13:05:11.309096       7 config.go:439] Config.HostUDS (--host-uds): none
D0923 13:05:11.309159       7 config.go:439] Config.HostFifo (--host-fifo): none
D0923 13:05:11.309173       7 config.go:439] Config.Network (--network): none
D0923 13:05:11.309179       7 config.go:439] Config.EnableRaw (--net-raw): false
D0923 13:05:11.309183       7 config.go:439] Config.AllowPacketEndpointWrite (--TESTONLY-allow-packet-endpoint-write): false
D0923 13:05:11.309188       7 config.go:439] Config.HostGSO (--gso): true
D0923 13:05:11.309192       7 config.go:439] Config.GVisorGSO (--software-gso): true
D0923 13:05:11.309197       7 config.go:439] Config.GVisorGRO (--gvisor-gro): false
D0923 13:05:11.309202       7 config.go:439] Config.TXChecksumOffload (--tx-checksum-offload): false
D0923 13:05:11.309207       7 config.go:439] Config.RXChecksumOffload (--rx-checksum-offload): true
D0923 13:05:11.309212       7 config.go:439] Config.QDisc (--qdisc): fifo
D0923 13:05:11.309217       7 config.go:439] Config.LogPackets (--log-packets): false
D0923 13:05:11.309231       7 config.go:439] Config.PCAP (--pcap-log): (empty)
D0923 13:05:11.309236       7 config.go:439] Config.Platform (--platform): systrap
D0923 13:05:11.309240       7 config.go:439] Config.PlatformDevicePath (--platform_device_path): (empty)
D0923 13:05:11.309245       7 config.go:439] Config.MetricServer (--metric-server): (empty)
D0923 13:05:11.309250       7 config.go:439] Config.ProfilingMetrics (--profiling-metrics): (empty)
D0923 13:05:11.309255       7 config.go:439] Config.ProfilingMetricsLog (--profiling-metrics-log): (empty)
D0923 13:05:11.309279       7 config.go:439] Config.ProfilingMetricsRate (--profiling-metrics-rate-us): 1000
D0923 13:05:11.309293       7 config.go:439] Config.Strace (--strace): false
D0923 13:05:11.309338       7 config.go:439] Config.StraceSyscalls (--strace-syscalls): (empty)
D0923 13:05:11.309345       7 config.go:439] Config.StraceLogSize (--strace-log-size): 1024
D0923 13:05:11.309350       7 config.go:439] Config.StraceEvent (--strace-event): false
D0923 13:05:11.309354       7 config.go:441] Config.DisableSeccomp: false
D0923 13:05:11.309360       7 config.go:439] Config.EnableCoreTags (--enable-core-tags): false
D0923 13:05:11.309366       7 config.go:439] Config.WatchdogAction (--watchdog-action): logWarning
D0923 13:05:11.309372       7 config.go:439] Config.PanicSignal (--panic-signal): -1
D0923 13:05:11.309378       7 config.go:439] Config.ProfileEnable (--profile): false
D0923 13:05:11.309381       7 config.go:439] Config.ProfileBlock (--profile-block): (empty)
D0923 13:05:11.309386       7 config.go:439] Config.ProfileCPU (--profile-cpu): (empty)
D0923 13:05:11.309400       7 config.go:439] Config.ProfileHeap (--profile-heap): (empty)
D0923 13:05:11.309404       7 config.go:439] Config.ProfileMutex (--profile-mutex): (empty)
D0923 13:05:11.309410       7 config.go:439] Config.TraceFile (--trace): (empty)
D0923 13:05:11.309415       7 config.go:439] Config.NumNetworkChannels (--num-network-channels): 1
D0923 13:05:11.309420       7 config.go:439] Config.NetworkProcessorsPerChannel (--network-processors-per-channel): 0
D0923 13:05:11.309476       7 config.go:439] Config.Rootless (--rootless): true
D0923 13:05:11.309483       7 config.go:439] Config.AlsoLogToStderr (--alsologtostderr): true
D0923 13:05:11.309488       7 config.go:439] Config.ReferenceLeak (--ref-leak-mode): disabled
D0923 13:05:11.309495       7 config.go:439] Config.CPUNumFromQuota (--cpu-num-from-quota): false
D0923 13:05:11.309499       7 config.go:439] Config.AllowFlagOverride (--allow-flag-override): false
D0923 13:05:11.309505       7 config.go:439] Config.OCISeccomp (--oci-seccomp): false
D0923 13:05:11.309509       7 config.go:439] Config.IgnoreCgroups (--ignore-cgroups): false
D0923 13:05:11.309513       7 config.go:439] Config.SystemdCgroup (--systemd-cgroup): false
D0923 13:05:11.309517       7 config.go:439] Config.PodInitConfig (--pod-init-config): (empty)
D0923 13:05:11.309522       7 config.go:439] Config.BufferPooling (--buffer-pooling): true
D0923 13:05:11.309526       7 config.go:439] Config.XDP (--EXPERIMENTAL-xdp): {0 }
D0923 13:05:11.309534       7 config.go:439] Config.AFXDPUseNeedWakeup (--EXPERIMENTAL-xdp-need-wakeup): true
D0923 13:05:11.309547       7 config.go:439] Config.FDLimit (--fdlimit): -1
D0923 13:05:11.309560       7 config.go:439] Config.DCache (--dcache): -1
D0923 13:05:11.309566       7 config.go:439] Config.IOUring (--iouring): false
D0923 13:05:11.309570       7 config.go:439] Config.DirectFS (--directfs): true
D0923 13:05:11.309575       7 config.go:439] Config.AppHugePages (--app-huge-pages): true
D0923 13:05:11.309580       7 config.go:439] Config.NVProxy (--nvproxy): false
D0923 13:05:11.309584       7 config.go:439] Config.NVProxyDocker (--nvproxy-docker): false
D0923 13:05:11.309589       7 config.go:439] Config.NVProxyDriverVersion (--nvproxy-driver-version): (empty)
D0923 13:05:11.309595       7 config.go:439] Config.TPUProxy (--tpuproxy): false
D0923 13:05:11.309600       7 config.go:439] Config.TestOnlyAllowRunAsCurrentUserWithoutChroot (--TESTONLY-unsafe-nonroot): false
D0923 13:05:11.309605       7 config.go:439] Config.TestOnlyTestNameEnv (--TESTONLY-test-name-env): (empty)
D0923 13:05:11.309610       7 config.go:439] Config.TestOnlyAFSSyscallPanic (--TESTONLY-afs-syscall-panic): false
D0923 13:05:11.309615       7 config.go:441] Config.explicitlySet: <map[string]struct {} Value> (unexported)
D0923 13:05:11.309625       7 config.go:439] Config.ReproduceNAT (--reproduce-nat): false
D0923 13:05:11.309630       7 config.go:439] Config.ReproduceNftables (--reproduce-nftables): false
D0923 13:05:11.309640       7 config.go:439] Config.NetDisconnectOk (--net-disconnect-ok): false
D0923 13:05:11.309647       7 config.go:439] Config.TestOnlyAutosaveImagePath (--TESTONLY-autosave-image-path): (empty)
D0923 13:05:11.309652       7 config.go:439] Config.TestOnlyAutosaveResume (--TESTONLY-autosave-resume): false
D0923 13:05:11.309657       7 config.go:439] Config.TestOnlySaveRestoreNetstack (--TESTONLY-save-restore-netstack): false
I0923 13:05:11.309669       7 main.go:201] **************** gVisor ****************
I0923 13:05:11.309758       7 namespace.go:247] *** Re-running as root in new user namespace ***
I0923 13:05:11.332950      12 main.go:196] **************** gVisor ****************
I0923 13:05:11.333016      12 main.go:197] Version release-20240916.0, go1.22.0 X:nocoverageredesign, amd64, 6 CPUs, linux, PID 12, PPID 7, UID 0, GID 0
D0923 13:05:11.333038      12 main.go:198] Page size: 0x1000 (4096 bytes)
I0923 13:05:11.333047      12 main.go:199] Args: [/proc/self/exe --rootless=true --network=none --root=/home/dangerzone/.containers --debug=true --alsologtostderr=true run --bundle=/home/dangerzone/dangerzone-image dangerzone]
I0923 13:05:11.333067      12 config.go:416] Platform: systrap
I0923 13:05:11.333084      12 config.go:417] RootDir: /home/dangerzone/.containers
I0923 13:05:11.333089      12 config.go:418] FileAccess: exclusive / Directfs: true / Overlay: root:self
I0923 13:05:11.333097      12 config.go:419] Network: none
I0923 13:05:11.333103      12 config.go:421] Debug: true. Strace: false, max size: 1024, syscalls:
D0923 13:05:11.333110      12 config.go:439] Config.RootDir (--root): /home/dangerzone/.containers
D0923 13:05:11.333116      12 config.go:439] Config.Traceback (--traceback): system
D0923 13:05:11.333122      12 config.go:439] Config.Debug (--debug): true
D0923 13:05:11.333126      12 config.go:439] Config.LogFilename (--log): (empty)
D0923 13:05:11.333130      12 config.go:439] Config.LogFormat (--log-format): text
D0923 13:05:11.333134      12 config.go:439] Config.DebugLog (--debug-log): (empty)
D0923 13:05:11.333159      12 config.go:439] Config.DebugToUserLog (--debug-to-user-log): false
D0923 13:05:11.333164      12 config.go:439] Config.DebugCommand (--debug-command): (empty)
D0923 13:05:11.333168      12 config.go:439] Config.PanicLog (--panic-log): (empty)
D0923 13:05:11.333175      12 config.go:439] Config.CoverageReport (--coverage-report): (empty)
D0923 13:05:11.333189      12 config.go:439] Config.DebugLogFormat (--debug-log-format): text
D0923 13:05:11.333193      12 config.go:439] Config.FileAccess (--file-access): exclusive
D0923 13:05:11.333204      12 config.go:439] Config.FileAccessMounts (--file-access-mounts): shared
D0923 13:05:11.333209      12 config.go:439] Config.Overlay (--overlay): false
D0923 13:05:11.333214      12 config.go:439] Config.Overlay2 (--overlay2): root:self
D0923 13:05:11.333219      12 config.go:439] Config.FSGoferHostUDS (--fsgofer-host-uds): false
D0923 13:05:11.333240      12 config.go:439] Config.HostUDS (--host-uds): none
D0923 13:05:11.333272      12 config.go:439] Config.HostFifo (--host-fifo): none
D0923 13:05:11.333282      12 config.go:439] Config.Network (--network): none
D0923 13:05:11.333289      12 config.go:439] Config.EnableRaw (--net-raw): false
D0923 13:05:11.333294      12 config.go:439] Config.AllowPacketEndpointWrite (--TESTONLY-allow-packet-endpoint-write): false
D0923 13:05:11.333299      12 config.go:439] Config.HostGSO (--gso): true
D0923 13:05:11.333303      12 config.go:439] Config.GVisorGSO (--software-gso): true
D0923 13:05:11.333307      12 config.go:439] Config.GVisorGRO (--gvisor-gro): false
D0923 13:05:11.333312      12 config.go:439] Config.TXChecksumOffload (--tx-checksum-offload): false
D0923 13:05:11.333317      12 config.go:439] Config.RXChecksumOffload (--rx-checksum-offload): true
D0923 13:05:11.333322      12 config.go:439] Config.QDisc (--qdisc): fifo
D0923 13:05:11.333327      12 config.go:439] Config.LogPackets (--log-packets): false
D0923 13:05:11.333340      12 config.go:439] Config.PCAP (--pcap-log): (empty)
D0923 13:05:11.333346      12 config.go:439] Config.Platform (--platform): systrap
D0923 13:05:11.333350      12 config.go:439] Config.PlatformDevicePath (--platform_device_path): (empty)
D0923 13:05:11.333355      12 config.go:439] Config.MetricServer (--metric-server): (empty)
D0923 13:05:11.333359      12 config.go:439] Config.ProfilingMetrics (--profiling-metrics): (empty)
D0923 13:05:11.333364      12 config.go:439] Config.ProfilingMetricsLog (--profiling-metrics-log): (empty)
D0923 13:05:11.333404      12 config.go:439] Config.ProfilingMetricsRate (--profiling-metrics-rate-us): 1000
D0923 13:05:11.333438      12 config.go:439] Config.Strace (--strace): false
D0923 13:05:11.333445      12 config.go:439] Config.StraceSyscalls (--strace-syscalls): (empty)
D0923 13:05:11.333450      12 config.go:439] Config.StraceLogSize (--strace-log-size): 1024
D0923 13:05:11.333455      12 config.go:439] Config.StraceEvent (--strace-event): false
D0923 13:05:11.333460      12 config.go:441] Config.DisableSeccomp: false
D0923 13:05:11.333466      12 config.go:439] Config.EnableCoreTags (--enable-core-tags): false
D0923 13:05:11.333471      12 config.go:439] Config.WatchdogAction (--watchdog-action): logWarning
D0923 13:05:11.333478      12 config.go:439] Config.PanicSignal (--panic-signal): -1
D0923 13:05:11.333482      12 config.go:439] Config.ProfileEnable (--profile): false
D0923 13:05:11.333487      12 config.go:439] Config.ProfileBlock (--profile-block): (empty)
D0923 13:05:11.333491      12 config.go:439] Config.ProfileCPU (--profile-cpu): (empty)
D0923 13:05:11.333506      12 config.go:439] Config.ProfileHeap (--profile-heap): (empty)
D0923 13:05:11.333511      12 config.go:439] Config.ProfileMutex (--profile-mutex): (empty)
D0923 13:05:11.333515      12 config.go:439] Config.TraceFile (--trace): (empty)
D0923 13:05:11.333521      12 config.go:439] Config.NumNetworkChannels (--num-network-channels): 1
D0923 13:05:11.333525      12 config.go:439] Config.NetworkProcessorsPerChannel (--network-processors-per-channel): 0
D0923 13:05:11.333530      12 config.go:439] Config.Rootless (--rootless): true
D0923 13:05:11.333535      12 config.go:439] Config.AlsoLogToStderr (--alsologtostderr): true
D0923 13:05:11.333542      12 config.go:439] Config.ReferenceLeak (--ref-leak-mode): disabled
D0923 13:05:11.333548      12 config.go:439] Config.CPUNumFromQuota (--cpu-num-from-quota): false
D0923 13:05:11.333552      12 config.go:439] Config.AllowFlagOverride (--allow-flag-override): false
D0923 13:05:11.333557      12 config.go:439] Config.OCISeccomp (--oci-seccomp): false
D0923 13:05:11.333561      12 config.go:439] Config.IgnoreCgroups (--ignore-cgroups): false
D0923 13:05:11.333566      12 config.go:439] Config.SystemdCgroup (--systemd-cgroup): false
D0923 13:05:11.333570      12 config.go:439] Config.PodInitConfig (--pod-init-config): (empty)
D0923 13:05:11.333575      12 config.go:439] Config.BufferPooling (--buffer-pooling): true
D0923 13:05:11.333580      12 config.go:439] Config.XDP (--EXPERIMENTAL-xdp): {0 }
D0923 13:05:11.333589      12 config.go:439] Config.AFXDPUseNeedWakeup (--EXPERIMENTAL-xdp-need-wakeup): true
D0923 13:05:11.333603      12 config.go:439] Config.FDLimit (--fdlimit): -1
D0923 13:05:11.333616      12 config.go:439] Config.DCache (--dcache): -1
D0923 13:05:11.333622      12 config.go:439] Config.IOUring (--iouring): false
D0923 13:05:11.333626      12 config.go:439] Config.DirectFS (--directfs): true
D0923 13:05:11.333631      12 config.go:439] Config.AppHugePages (--app-huge-pages): true
D0923 13:05:11.333636      12 config.go:439] Config.NVProxy (--nvproxy): false
D0923 13:05:11.333641      12 config.go:439] Config.NVProxyDocker (--nvproxy-docker): false
D0923 13:05:11.333689      12 config.go:439] Config.NVProxyDriverVersion (--nvproxy-driver-version): (empty)
D0923 13:05:11.333703      12 config.go:439] Config.TPUProxy (--tpuproxy): false
D0923 13:05:11.333708      12 config.go:439] Config.TestOnlyAllowRunAsCurrentUserWithoutChroot (--TESTONLY-unsafe-nonroot): false
D0923 13:05:11.333714      12 config.go:439] Config.TestOnlyTestNameEnv (--TESTONLY-test-name-env): (empty)
D0923 13:05:11.333719      12 config.go:439] Config.TestOnlyAFSSyscallPanic (--TESTONLY-afs-syscall-panic): false
D0923 13:05:11.333725      12 config.go:441] Config.explicitlySet: <map[string]struct {} Value> (unexported)
D0923 13:05:11.333737      12 config.go:439] Config.ReproduceNAT (--reproduce-nat): false
D0923 13:05:11.333744      12 config.go:439] Config.ReproduceNftables (--reproduce-nftables): false
D0923 13:05:11.333757      12 config.go:439] Config.NetDisconnectOk (--net-disconnect-ok): false
D0923 13:05:11.333764      12 config.go:439] Config.TestOnlyAutosaveImagePath (--TESTONLY-autosave-image-path): (empty)
D0923 13:05:11.333769      12 config.go:439] Config.TestOnlyAutosaveResume (--TESTONLY-autosave-resume): false
D0923 13:05:11.333774      12 config.go:439] Config.TestOnlySaveRestoreNetstack (--TESTONLY-save-restore-netstack): false
I0923 13:05:11.333785      12 main.go:201] **************** gVisor ****************
W0923 13:05:11.334658      12 specutils.go:129] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
D0923 13:05:11.334835      12 specutils.go:91] Spec:
{
  "ociVersion": "1.0.0",
  "process": {
    "user": {
      "uid": 1000,
      "gid": 1000
    },
    "args": [
      "/usr/bin/python3",
      "-m",
      "dangerzone.conversion.doc_to_pixels"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "PYTHONPATH=/opt/dangerzone",
      "TERM=xterm"
    ],
    "cwd": "/",
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 4096,
        "soft": 4096
      }
    ]
  },
  "root": {
    "path": "/home/dangerzone/dangerzone-image/rootfs",
    "readonly": true
  },
  "hostname": "dangerzone",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "/home/dangerzone/dangerzone-image/proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "/home/dangerzone/dangerzone-image/tmpfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/sys",
      "type": "tmpfs",
      "source": "/home/dangerzone/dangerzone-image/tmpfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/tmp",
      "type": "tmpfs",
      "source": "/home/dangerzone/dangerzone-image/tmpfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/home/dangerzone",
      "type": "tmpfs",
      "source": "/home/dangerzone/dangerzone-image/tmpfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/usr/lib/libreoffice/share/extensions/",
      "type": "tmpfs",
      "source": "/home/dangerzone/dangerzone-image/tmpfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    }
  ],
  "linux": {
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      }
    ]
  }
}
D0923 13:05:11.334884      12 container.go:547] Run container, cid: dangerzone, rootDir: "/home/dangerzone/.containers"
D0923 13:05:11.334974      12 container.go:200] Create container, cid: dangerzone, rootDir: "/home/dangerzone/.containers"
D0923 13:05:11.335038      12 container.go:1800] Configuring container with a new userns with identity user mappings into current userns
D0923 13:05:11.335079      12 container.go:1856] UID Mappings:
D0923 13:05:11.335087      12 container.go:1858]        Container ID: 0, Host ID: 0, Range Length: 1
D0923 13:05:11.335153      12 container.go:1856] GID Mappings:
D0923 13:05:11.335202      12 container.go:1858]        Container ID: 0, Host ID: 0, Range Length: 1
D0923 13:05:11.335301      12 container.go:265] Creating new sandbox for container, cid: dangerzone
D0923 13:05:11.335339      12 cgroup.go:428] New cgroup for pid: self, *cgroup.cgroupV2: &{Mountpoint:/sys/fs/cgroup Path:/dangerzone Controllers:[cpuset cpu io memory pids rdma] Own:[]}
D0923 13:05:11.335420      12 cgroup_v2.go:132] Installing cgroup path "/sys/fs/cgroup/dangerzone"
D0923 13:05:11.335466      12 cgroup_v2.go:177] Deleting cgroup "/sys/fs/cgroup/dangerzone"
W0923 13:05:11.335511      12 container.go:1770] Skipping cgroup configuration in rootless mode: open /sys/fs/cgroup/cgroup.subtree_control: read-only file system
I0923 13:05:11.335767      12 namespace.go:198] Mapping host uid 0 to container uid 0 (size=1)
I0923 13:05:11.335807      12 namespace.go:206] Mapping host gid 0 to container gid 0 (size=1)
D0923 13:05:11.335828      12 donation.go:32] Donating FD 3: "/home/dangerzone/dangerzone-image/config.json"
D0923 13:05:11.335857      12 donation.go:32] Donating FD 4: "|1"
D0923 13:05:11.335862      12 donation.go:32] Donating FD 5: "gofer IO FD"
D0923 13:05:11.335875      12 container.go:1367] Starting gofer: /proc/self/exe [runsc-gofer --root=/home/dangerzone/.containers --debug=true --network=none --rootless=true --alsologtostderr=true gofer --bundle /home/dangerzone/dangerzone-image --gofer-mount-confs=lisafs:none --spec-fd=3 --mounts-fd=4 --io-fds=5]
I0923 13:05:11.338297      12 container.go:1371] Gofer started, PID: 20
I0923 13:05:11.338460      12 sandbox.go:791] Failed to set RLIMIT_MEMLOCK: operation not permitted
D0923 13:05:11.338874      12 sandbox.go:93] Attempting to create socket file "/home/dangerzone/.containers/runsc-dangerzone.sock"
D0923 13:05:11.338969      12 sandbox.go:96] Using socket file "/home/dangerzone/.containers/runsc-dangerzone.sock"
I0923 13:05:11.338987      12 sandbox.go:897] Control socket path: "/home/dangerzone/.containers/runsc-dangerzone.sock"
I0923 13:05:11.339013      12 sandbox.go:944] Sandbox will be started in new mount, IPC and UTS namespaces
I0923 13:05:11.339028      12 sandbox.go:969] Sandbox will be started in new network namespace
I0923 13:05:11.339043      12 sandbox.go:986] Sandbox will be started in container's user namespace: {Type:user Path:}
I0923 13:05:11.339192      12 namespace.go:198] Mapping host uid 0 to container uid 0 (size=1)
I0923 13:05:11.339205      12 namespace.go:206] Mapping host gid 0 to container gid 0 (size=1)
I0923 13:05:11.339263      12 sandbox.go:1016] Sandbox will be started in minimal chroot
D0923 13:05:11.339375      12 donation.go:32] Donating FD 3: "sandbox IO FD"
D0923 13:05:11.339385      12 donation.go:32] Donating FD 4: "|0"
D0923 13:05:11.339390      12 donation.go:32] Donating FD 5: "|1"
D0923 13:05:11.339405      12 donation.go:32] Donating FD 6: "control_server_socket"
D0923 13:05:11.339410      12 donation.go:32] Donating FD 7: "/home/dangerzone/dangerzone-image/config.json"
D0923 13:05:11.339414      12 donation.go:32] Donating FD 8: "/dev/stdin"
D0923 13:05:11.339418      12 donation.go:32] Donating FD 9: "/dev/stdout"
D0923 13:05:11.339422      12 donation.go:32] Donating FD 10: "/dev/stderr"
D0923 13:05:11.339427      12 sandbox.go:1210] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/home/dangerzone/.containers --debug=true --network=none --rootless=true --alsologtostderr=true boot --bundle=/home/dangerzone/dangerzone-image --gofer-mount-confs=lisafs:none --apply-caps=true --setup-root --total-host-memory 8231772160 --total-memory 8231772160 --attached --io-fds=3 --dev-io-fd=-1 --mounts-fd=4 --start-sync-fd=5 --controller-fd=6 --spec-fd=7 --stdio-fds=8 --stdio-fds=9 --stdio-fds=10 dangerzone]
D0923 13:05:11.339456      12 sandbox.go:1211] SysProcAttr: &{Chroot: Credential:0xc00034f170 Ptrace:false Setsid:true Setpgid:false Setctty:false Noctty:false Ctty:0 Foreground:false Pgid:0 Pdeathsig:killed Cloneflags:0 Unshareflags:0 UidMappings:[{ContainerID:0 HostID:0 Size:1}] GidMappings:[{ContainerID:0 HostID:0 Size:1}] GidMappingsEnableSetgroups:false AmbientCaps:[] UseCgroupFD:false CgroupFD:0 PidFD:<nil>}
I0923 13:05:11.341190      12 sandbox.go:1239] Sandbox started, PID: 25
I0923 13:05:11.355588       1 main.go:196] **************** gVisor ****************
I0923 13:05:11.355649       1 main.go:197] Version release-20240916.0, go1.22.0 X:nocoverageredesign, amd64, 6 CPUs, linux, PID 1, PPID 0, UID 0, GID 0
D0923 13:05:11.355659       1 main.go:198] Page size: 0x1000 (4096 bytes)
I0923 13:05:11.355667       1 main.go:199] Args: [runsc-sandbox --root=/home/dangerzone/.containers --debug=true --network=none --rootless=true --alsologtostderr=true boot --bundle=/home/dangerzone/dangerzone-image --gofer-mount-confs=lisafs:none --apply-caps=true --setup-root --total-host-memory 8231772160 --total-memory 8231772160 --attached --io-fds=3 --dev-io-fd=-1 --mounts-fd=4 --start-sync-fd=5 --controller-fd=6 --spec-fd=7 --stdio-fds=8 --stdio-fds=9 --stdio-fds=10 dangerzone]
I0923 13:05:11.355699       1 config.go:416] Platform: systrap
I0923 13:05:11.355737       1 config.go:417] RootDir: /home/dangerzone/.containers
I0923 13:05:11.355745       1 config.go:418] FileAccess: exclusive / Directfs: true / Overlay: root:self
I0923 13:05:11.355753       1 config.go:419] Network: none
I0923 13:05:11.355759       1 config.go:421] Debug: true. Strace: false, max size: 1024, syscalls:
D0923 13:05:11.355765       1 config.go:439] Config.RootDir (--root): /home/dangerzone/.containers
D0923 13:05:11.355772       1 config.go:439] Config.Traceback (--traceback): system
D0923 13:05:11.355778       1 config.go:439] Config.Debug (--debug): true
D0923 13:05:11.355783       1 config.go:439] Config.LogFilename (--log): (empty)
D0923 13:05:11.355788       1 config.go:439] Config.LogFormat (--log-format): text
D0923 13:05:11.355792       1 config.go:439] Config.DebugLog (--debug-log): (empty)
D0923 13:05:11.355797       1 config.go:439] Config.DebugToUserLog (--debug-to-user-log): false
D0923 13:05:11.355802       1 config.go:439] Config.DebugCommand (--debug-command): (empty)
D0923 13:05:11.355806       1 config.go:439] Config.PanicLog (--panic-log): (empty)
D0923 13:05:11.355810       1 config.go:439] Config.CoverageReport (--coverage-report): (empty)
D0923 13:05:11.355866       1 config.go:439] Config.DebugLogFormat (--debug-log-format): text
D0923 13:05:11.355873       1 config.go:439] Config.FileAccess (--file-access): exclusive
D0923 13:05:11.355887       1 config.go:439] Config.FileAccessMounts (--file-access-mounts): shared
D0923 13:05:11.355896       1 config.go:439] Config.Overlay (--overlay): false
D0923 13:05:11.355901       1 config.go:439] Config.Overlay2 (--overlay2): root:self
D0923 13:05:11.355934       1 config.go:439] Config.FSGoferHostUDS (--fsgofer-host-uds): false
D0923 13:05:11.355942       1 config.go:439] Config.HostUDS (--host-uds): none
D0923 13:05:11.355957       1 config.go:439] Config.HostFifo (--host-fifo): none
D0923 13:05:11.355969       1 config.go:439] Config.Network (--network): none
D0923 13:05:11.355975       1 config.go:439] Config.EnableRaw (--net-raw): false
D0923 13:05:11.355980       1 config.go:439] Config.AllowPacketEndpointWrite (--TESTONLY-allow-packet-endpoint-write): false
D0923 13:05:11.355985       1 config.go:439] Config.HostGSO (--gso): true
D0923 13:05:11.355989       1 config.go:439] Config.GVisorGSO (--software-gso): true
D0923 13:05:11.355994       1 config.go:439] Config.GVisorGRO (--gvisor-gro): false
D0923 13:05:11.355999       1 config.go:439] Config.TXChecksumOffload (--tx-checksum-offload): false
D0923 13:05:11.356003       1 config.go:439] Config.RXChecksumOffload (--rx-checksum-offload): true
D0923 13:05:11.356008       1 config.go:439] Config.QDisc (--qdisc): fifo
D0923 13:05:11.356013       1 config.go:439] Config.LogPackets (--log-packets): false
D0923 13:05:11.356026       1 config.go:439] Config.PCAP (--pcap-log): (empty)
D0923 13:05:11.356031       1 config.go:439] Config.Platform (--platform): systrap
D0923 13:05:11.356036       1 config.go:439] Config.PlatformDevicePath (--platform_device_path): (empty)
D0923 13:05:11.356041       1 config.go:439] Config.MetricServer (--metric-server): (empty)
D0923 13:05:11.356046       1 config.go:439] Config.ProfilingMetrics (--profiling-metrics): (empty)
D0923 13:05:11.356050       1 config.go:439] Config.ProfilingMetricsLog (--profiling-metrics-log): (empty)
D0923 13:05:11.356055       1 config.go:439] Config.ProfilingMetricsRate (--profiling-metrics-rate-us): 1000
D0923 13:05:11.356059       1 config.go:439] Config.Strace (--strace): false
D0923 13:05:11.356063       1 config.go:439] Config.StraceSyscalls (--strace-syscalls): (empty)
D0923 13:05:11.356068       1 config.go:439] Config.StraceLogSize (--strace-log-size): 1024
D0923 13:05:11.356100       1 config.go:439] Config.StraceEvent (--strace-event): false
D0923 13:05:11.356106       1 config.go:441] Config.DisableSeccomp: false
D0923 13:05:11.356113       1 config.go:439] Config.EnableCoreTags (--enable-core-tags): false
D0923 13:05:11.356120       1 config.go:439] Config.WatchdogAction (--watchdog-action): logWarning
D0923 13:05:11.356127       1 config.go:439] Config.PanicSignal (--panic-signal): -1
D0923 13:05:11.356132       1 config.go:439] Config.ProfileEnable (--profile): false
D0923 13:05:11.356194       1 config.go:439] Config.ProfileBlock (--profile-block): (empty)
D0923 13:05:11.356202       1 config.go:439] Config.ProfileCPU (--profile-cpu): (empty)
D0923 13:05:11.356217       1 config.go:439] Config.ProfileHeap (--profile-heap): (empty)
D0923 13:05:11.356222       1 config.go:439] Config.ProfileMutex (--profile-mutex): (empty)
D0923 13:05:11.356227       1 config.go:439] Config.TraceFile (--trace): (empty)
D0923 13:05:11.356232       1 config.go:439] Config.NumNetworkChannels (--num-network-channels): 1
D0923 13:05:11.356236       1 config.go:439] Config.NetworkProcessorsPerChannel (--network-processors-per-channel): 0
D0923 13:05:11.356241       1 config.go:439] Config.Rootless (--rootless): true
D0923 13:05:11.356246       1 config.go:439] Config.AlsoLogToStderr (--alsologtostderr): true
D0923 13:05:11.356250       1 config.go:439] Config.ReferenceLeak (--ref-leak-mode): disabled
D0923 13:05:11.356257       1 config.go:439] Config.CPUNumFromQuota (--cpu-num-from-quota): false
D0923 13:05:11.356261       1 config.go:439] Config.AllowFlagOverride (--allow-flag-override): false
D0923 13:05:11.356265       1 config.go:439] Config.OCISeccomp (--oci-seccomp): false
D0923 13:05:11.356269       1 config.go:439] Config.IgnoreCgroups (--ignore-cgroups): false
D0923 13:05:11.356274       1 config.go:439] Config.SystemdCgroup (--systemd-cgroup): false
D0923 13:05:11.356278       1 config.go:439] Config.PodInitConfig (--pod-init-config): (empty)
D0923 13:05:11.356283       1 config.go:439] Config.BufferPooling (--buffer-pooling): true
D0923 13:05:11.356288       1 config.go:439] Config.XDP (--EXPERIMENTAL-xdp): {0 }
D0923 13:05:11.356296       1 config.go:439] Config.AFXDPUseNeedWakeup (--EXPERIMENTAL-xdp-need-wakeup): true
D0923 13:05:11.356308       1 config.go:439] Config.FDLimit (--fdlimit): -1
D0923 13:05:11.356345       1 config.go:439] Config.DCache (--dcache): -1
D0923 13:05:11.356350       1 config.go:439] Config.IOUring (--iouring): false
D0923 13:05:11.356355       1 config.go:439] Config.DirectFS (--directfs): true
D0923 13:05:11.356360       1 config.go:439] Config.AppHugePages (--app-huge-pages): true
D0923 13:05:11.356365       1 config.go:439] Config.NVProxy (--nvproxy): false
D0923 13:05:11.356369       1 config.go:439] Config.NVProxyDocker (--nvproxy-docker): false
D0923 13:05:11.356374       1 config.go:439] Config.NVProxyDriverVersion (--nvproxy-driver-version): (empty)
D0923 13:05:11.356378       1 config.go:439] Config.TPUProxy (--tpuproxy): false
D0923 13:05:11.356391       1 config.go:439] Config.TestOnlyAllowRunAsCurrentUserWithoutChroot (--TESTONLY-unsafe-nonroot): false
D0923 13:05:11.356426       1 config.go:439] Config.TestOnlyTestNameEnv (--TESTONLY-test-name-env): (empty)
D0923 13:05:11.356434       1 config.go:439] Config.TestOnlyAFSSyscallPanic (--TESTONLY-afs-syscall-panic): false
D0923 13:05:11.356439       1 config.go:441] Config.explicitlySet: <map[string]struct {} Value> (unexported)
D0923 13:05:11.356452       1 config.go:439] Config.ReproduceNAT (--reproduce-nat): false
D0923 13:05:11.356458       1 config.go:439] Config.ReproduceNftables (--reproduce-nftables): false
D0923 13:05:11.356463       1 config.go:439] Config.NetDisconnectOk (--net-disconnect-ok): false
D0923 13:05:11.356468       1 config.go:439] Config.TestOnlyAutosaveImagePath (--TESTONLY-autosave-image-path): (empty)
D0923 13:05:11.356472       1 config.go:439] Config.TestOnlyAutosaveResume (--TESTONLY-autosave-resume): false
D0923 13:05:11.356478       1 config.go:439] Config.TestOnlySaveRestoreNetstack (--TESTONLY-save-restore-netstack): false
I0923 13:05:11.356492       1 main.go:201] **************** gVisor ****************
W0923 13:05:11.358402       1 boot.go:266] Not setting product_name: open /sys/devices/virtual/dmi/id/product_name: no such file or directory
I0923 13:05:11.358522       1 boot.go:279] Setting host-shmem-huge: "never"
W0923 13:05:11.359241       1 specutils.go:129] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
I0923 13:05:11.359297       1 chroot.go:92] Setting up sandbox chroot in "/tmp"
W0923 13:05:11.359386       1 chroot.go:109] Failed to copy /etc/localtime: open /etc/localtime: no such file or directory. UTC timezone will be used.
I0923 13:05:11.359425       1 chroot.go:37] Mounting "proc" at "/tmp/proc"
W0923 13:05:11.359477       1 util.go:64] FATAL ERROR: error setting up chroot: error mounting proc in chroot: error mounting "proc" at "/tmp/proc": mount("proc", "/tmp/proc", 15) failed: operation not permitted
error setting up chroot: error mounting proc in chroot: error mounting "proc" at "/tmp/proc": mount("proc", "/tmp/proc", 15) failed: operation not permitted
D0923 13:05:11.360661      12 sandbox.go:1330] Destroying sandbox "dangerzone"
D0923 13:05:11.360799      12 sandbox.go:1340] Killing sandbox "dangerzone"
D0923 13:05:11.360846      12 container.go:793] Destroy container, cid: dangerzone
D0923 13:05:11.360875      12 container.go:1104] Killing gofer for container, cid: dangerzone, PID: 20
W0923 13:05:11.368217      12 util.go:64] FATAL ERROR: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
W0923 13:05:11.368396      12 main.go:231] Failure to execute command, err: 1
gVisor quit with exit code: 128
@EtiennePerot
Copy link
Contributor

I believe this is the same issue being described by @terenceli in this blog post. @avagin helped with diagnosing this. See issue #8205 and opencontainers/runc#1658.

That said I don't understand why this would be release-related, as I don't think something changed in the startup process that would change this behavior...

@EtiennePerot
Copy link
Contributor

@avagin bisected this to commit cc1f550, specifically in runsc/cmd/chroot.go:

-	if pidns {
-		flags := uint32(unix.MS_NOSUID | unix.MS_NODEV | unix.MS_NOEXEC | unix.MS_RDONLY)
-		if err := mountInChroot(chroot, "proc", "/proc", "proc", flags); err != nil {
-			return fmt.Errorf("error mounting proc in chroot: %v", err)
-		}
-	} else {
-		if err := mountInChroot(chroot, "/proc", "/proc", "bind", unix.MS_BIND|unix.MS_RDONLY|unix.MS_REC); err != nil {
-			return fmt.Errorf("error mounting proc in chroot: %v", err)
-		}
+	flags := uint32(unix.MS_NOSUID | unix.MS_NODEV | unix.MS_NOEXEC | unix.MS_RDONLY)
+	if err := mountInChroot(chroot, "proc", "/proc", "proc", flags); err != nil {
+		return fmt.Errorf("error mounting proc in chroot: %v", err)
	}

This means that the chroot used to use a recursive read-only bind mount of /proc (MS_BIND | MS_READONLY | MS_REC), but now it mounts a new read-only procfs instead, and this fails due to the reasons explained in #8205.

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Sep 23, 2024

I will add a test to //runsc:container_test that runs runsc do in a non-runsc Docker container, in order to ensure this doesn't break again in the future.

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 23, 2024

This means that the chroot used to use a recursive read-only bind mount of /proc

That was only the case for ptrace & systrap platform (the pidns variable would be false for these). For other platforms, it was still doing what it was doing today. cc1f550 just got rid of the ptrace/systrap special case. Systrap and ptrace were running in the caller's pidns.

We still want to execute the sandbox process in a new pidns, so can't bind mount the procfs mount because it is presenting data for the parent pidns.

@EtiennePerot
Copy link
Contributor

What fix do you suggest?

@avagin
Copy link
Collaborator

avagin commented Sep 24, 2024

@ayushr2 the sandbox process access only generic files and /proc/self, so we actually don't need proc from the target pid namespace.

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 24, 2024

@avagin If you think bind mounting the proc mount is OK, I would defer to you. It seems a bit of a foot gun to do this to me, as it can lead to surprises if the sandbox tries to do any /proc/PID-kinda work in the future. There is no good way to enforce that no future changes will use proc files other than "only generic files and /proc/self". Maybe seccomp?

What fix do you suggest?

It is important to run the sandbox in different pidns so the sandbox can not impact host pidns with fork bombs and such calamities. I don't have a different fix in mind yet unfortunately...

@apyrgio
Copy link
Contributor Author

apyrgio commented Sep 24, 2024

Ok, I guess I get what's going on with overmounting and how it affects procfs. These particular comments helped a bit:

So, the underlying issue is that when the inner container attempts to mount procfs, it will reveal some security-sensitive /proc/ paths (e.g., /proc/kcore) that were masked by the outer container (not sure how Docker does it, perhaps mounting an empty dir on top of them). The Linux kernel is smart enough to prevent this, cool.

I went down the rabbit hole a bit and found out the following:

  • The Docker team added the --security-opt systempaths=unconfined flag, in docker run (docs).
    • I guess this is an escape hatch in the general "run gVisor within a container" case, but for Dangerzone specifically, I'd prefer to not go down that route.
  • The Docker team then sent a PR to Podman (awww) to add this flag, for feature-parity. Something interesting in the cover letter is that newer Linux kernels will have a mount option that will circumvent this issue.
  • Turns out that as of Linux 5.8, there's a subset=pid mount option, that "[...] hides all top level files and directories in the procfs that are not related to tasks.".

So, could gVisor perhaps mount procfs using that option?

@avagin
Copy link
Collaborator

avagin commented Sep 26, 2024

Turns out that as of Linux 5.8, there's a subset=pid mount option, that "[...] hides all top level files and directories in the procfs that are not related to tasks.".

When runsc is started, it reads a few top level files such as /proc/cpuinfo, /proc/sys/vm/mmap_min_addr, /proc/self/auxv, /proc/sys/kernel/cap_last_cap.

@EtiennePerot
Copy link
Contributor

Could we mount procfs with subset=pid, and then separately also bind-mount /proc/{cpuinfo,sys/vm/mmap_min_addr,sys/kernel/cap_last_cap} somewhere?

@avagin
Copy link
Collaborator

avagin commented Oct 1, 2024

Turns out that as of Linux 5.8, there's a subset=pid mount option, that "[...] hides all top level files and directories in the procfs that are not related to tasks.".

subset=pid doesn't help to avoid this issue. The kernel still does the same check and doesn't allow us to create a new proc instance:

$ unshare -Urmfp mount -t proc -o subset=pid proc /proc
avagin@avagin3:~$ echo $?
0
$ sudo mount --bind /dev/null /proc/sysrq-trigger 
$ unshare -Urmfp mount -t proc -o subset=pid proc /proc
mount: /proc: permission denied.
       dmesg(1) may have more information after failed mount system call.

copybara-service bot pushed a commit that referenced this issue Oct 2, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
copybara-service bot pushed a commit that referenced this issue Oct 2, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
copybara-service bot pushed a commit that referenced this issue Oct 2, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
@avagin
Copy link
Collaborator

avagin commented Oct 3, 2024

We can do something like 231c152

  • /proc is a tmpfs mount
  • required static proc files are copied to the /proc tmpfs mount
  • it mounts a new proc instance in /proc/sandbox-proc. If it doesn't work, it bind-mounts the current proc to /proc/host-proc. It guaranties that we will not use a wrong /proc/pid by mistake.
  • /proc/self is a symlink to /proc/{sandbox or host}-proc/self

copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 681229280
copybara-service bot pushed a commit that referenced this issue Oct 4, 2024
This is used in contexts such as Dangerzone:
https://gvisor.dev/blog/2024/09/23/safe-ride-into-the-dangerzone/

Updates issue #10944.

PiperOrigin-RevId: 682454284
@apyrgio
Copy link
Contributor Author

apyrgio commented Oct 7, 2024

Thanks for looking into it @avagin. I tried your commands (#10944 (comment)) and indeed they failed with "Mount too revealing". Weird...

For what is worth, your workaround looks fine to me.

@apyrgio
Copy link
Contributor Author

apyrgio commented Oct 30, 2024

Hey folks. Just checking if this issue will be worked on the next releases. We'll release a new Dangerzone version in a few days and it would be nice to offer the latest gVisor version. Currently, we are pinned to version 20240826. If there's no fix any time soon, that's fine, we'll try to think of a workaround for this issue. Thanks!

@EtiennePerot
Copy link
Contributor

I will be picking up @avagin's patch and submitting it soon.

copybara-service bot pushed a commit that referenced this issue Oct 31, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Oct 31, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Oct 31, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Oct 31, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Nov 1, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Nov 1, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Nov 1, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
copybara-service bot pushed a commit that referenced this issue Nov 1, 2024
…mount.

As part of sandbox startup, `runsc` needs to set up a chroot environment
with a minimal working `procfs` filesystem mounted within. However, doing
so from within a container (as applications like Dangerzone do) may fail,
because in the container runtime's default configuration, some paths of the
procfs filesystem visible from within the container may be obstructed. This
prevents mounting new unobstructed instances of `procfs`.

This change detects this case and falls back to the previous behavior of
using a recursive bind-mount of `/proc` in such a case. The obstructed
subdirectories of procfs are preserved in this case, which is fine because
we only need a very minimal subset of `procfs` to actually work.

Additionally, `runsc` actually only needs a few kernel parameter files
and `/proc/self` in order to work. So this change sets up a `tmpfs` mount
that contains just those files, with the kernel parameter files being
plainly copied and `/proc/self` being a symlink to the one present in the
mounted view of `procfs` (regardless of which mounting method was used).

The `runtime_in_docker` test will continuously verify that this fallback
mechanism works to avoid similar breakage in the future.

Credits to @avagin for figuring out this solution.

Fixes #10944.

PiperOrigin-RevId: 691672104
@EtiennePerot
Copy link
Contributor

Should be fixed with 6adc072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants