From 9073ddf4ef54dee73c7ae23175212742e7455259 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Fri, 12 Aug 2016 22:56:40 -0700 Subject: [PATCH] config: Document 'rbind' and 'bind' mount options extensions They are not filesystem types, so they don't belong in 'type'. The specs claim mount(2) as inspiration for this modeling (which makes sense, since that's the syscall Linux runtimes will make to implement it), and there (recursive) bind is represented by mountflags (MS_REC | MS_BIND). Currently the 'options' property handles both mount(2)'s mountflags and data, so 'options' is a good spot for these two settings. Before this commit, we were punting this sort of table to mount(8)'s filesystem-independent mount options. With this commit we drop the mount(8) reference and replace it with explicit requirements based on mount(2), as approved by Michael [1]. Personally, I prefer the old mount(8) punt, but have been unable to get (recursive) bind documented without removing it. The option strings still come from mount(8)'s filesytem-independent mount options with the following exceptions: * move, rbind, rprivate, rshared, rslave, and runbindable are exposed in mount(8) through long options (e.g. --move). * (no)acl is listed under filesystem-specific mount options (e.g. for ext2). This commit covers the MS_* entries from [2] with the following exceptions: * MS_VERBOSE, which has been deprecated in favor of MS_SILENT. * MS_KERNMOUNT, since the mount(2) calls won't be kern_mount calls and they are not covered in mount(8). * MS_SUBMOUNT and other flags documented as "internal to the kernel". * MS_RMT_MASK, since it's a mask and not a flag. * MS_MGC_*, since the magic mount flag is ignored since Linux 2.4 according to mount(2). The example I'm touching used: "type": "bind", ... "options": ["rbind", ...] but I don't see a point to putting 'bind' in 'type' when it's not a filesystem type and you already have 'rbind' in 'options'. We could have stuck closer mount(2) by using: "options": ["recursive", "bind", ...] but while that approach extends more conveniently to the other recursive mounts (recursive shared, slave, private, and unbindable mounts), there has been resistance to a direct MS_REC analog [3,4]. I think that resistance is ungrounded (obviously the kernel and mount(2) feels like a composable MS_REC makes sense), but I'm not a mainainer. Since there are existing consumers using the old example format and similar things like runtime-tools: $ git log --oneline -1 | cat 03e8b89 Merge pull request #176 from hmeng-19/set_oom_score_adj $ ./ocitools generate --template <(echo '{}') --bind ab:cd:ro | jq '.mounts[0]' { "destination": "cd", "type": "bind", "source": "ab", "options": [ "bind", "ro" ] } this may be a breaking change for some spec consumers (although that ocitools example will still work, because 'options' contains 'bind', which means the 'type' will be ignored). But even if there are broken consumers, we're still pre-1.0, the spec never explained what bind/rbind meant before this commit, and consolidating the Linux mount spec around mount(2) now will make life less confusing in the future. [1]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-05-09-20.07.log.html#l-24 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fs.h?id=refs/tags/v4.11#n105 [3]: https://github.com/opencontainers/runtime-spec/pull/530#issuecomment-257028720 [4]: https://github.com/opencontainers/runtime-spec/pull/771#issuecomment-297509327 Signed-off-by: W. Trevor King --- config.md | 77 ++++++++++++++++++++++++++++++++++++++-------- specs-go/config.go | 2 +- 2 files changed, 65 insertions(+), 14 deletions(-) diff --git a/config.md b/config.md index efaf0a0e5..f2cb33c96 100644 --- a/config.md +++ b/config.md @@ -55,18 +55,73 @@ For Windows, see [mountvol][mountvol] and [SetVolumeMountPoint][set-volume-mount * **`destination`** (string, REQUIRED) Destination of mount point: path inside container. This value MUST be an absolute path. + * Linux: runtimes MUST pass this value to [`mount(2)`][mount.2] as the `target` argument. * Windows: one mount destination MUST NOT be nested within another mount (e.g., c:\\foo and c:\\foo\\bar). * Solaris: corresponds to "dir" of the fs resource in [zonecfg(1M)][zonecfg.1m]. -* **`type`** (string, OPTIONAL) The filesystem type of the filesystem to be mounted. - * Linux: valid *filesystemtype* supported by the kernel as listed in */proc/filesystems* (e.g., "minix", "ext2", "ext3", "jfs", "xfs", "reiserfs", "msdos", "proc", "nfs", "iso9660"). - * Windows: the type of file system on the volume, e.g. "ntfs". - * Solaris: corresponds to "type" of the fs resource in [zonecfg(1M)][zonecfg.1m]. +* **`type`** (string, OPTIONAL) The type of filesystem to mount. + If `type` is unset, the runtime MAY ask the kernel to guess the desired type. + Depending on the mount `options`, the `type` field MAY be ignored. + * Linux: when `type` is set, runtimes MUST pass the value to [`mount(2)`][mount.2] as the `filesystemtype` argument. + `type` is ignored when `options` contains `bind` or `rbind`; see the `MS_BIND` description in [mount(2)][mount.2]. + * Windows: the type of file system on the volume, e.g. "ntfs". + * Solaris: corresponds to "type" of the fs resource in [zonecfg(1M)][zonecfg.1m]. * **`source`** (string, OPTIONAL) A device name, but can also be a directory name or a dummy. - * Windows: the volume name that is the target of the mount point, \\?\Volume\{GUID}\ (on Windows source is called target). - * Solaris: corresponds to "special" of the fs resource in [zonecfg(1M)][zonecfg.1m]. + * Linux: when `type` is set, runtimes MUST pass the value to [`mount(2)`][mount.2] as the `source` argument. + When `type` is not set, the value used for the `source` argument is unspecified. + * Windows: the volume name that is the target of the mount point, \\?\Volume\{GUID}\ (on Windows source is called target). + * Solaris: corresponds to "special" of the fs resource in [zonecfg(1M)][zonecfg.1m]. * **`options`** (list of strings, OPTIONAL) Mount options of the filesystem to be used. - * Linux: supported options are listed in the [mount(8)][mount.8] man page. Note both [filesystem-independent][mount.8-filesystem-independent] and [filesystem-specific][mount.8-filesystem-specific] options are listed. - * Solaris: corresponds to "options" of the fs resource in [zonecfg(1M)][zonecfg.1m]. + * Linux: runtimes MUST call [`mount(2)`][mount.2] with an initially-zero `mountflags` argument altered by applying the following `options` in the listed order: + + | Option | Set bits | Clear bits | + | --------------- | ------------------------ | ------------------------------------------------------------------------------------- | + | `acl` | `MS_POSIXACL` | | + | `async` | | `MS_SYNCHRONOUS` | + | `atime` | | `MS_NOATIME` | + | `bind` | `MS_BIND` | | + | `defaults` | | `MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_NOAUTO | MS_USER | MS_SYNCHRONOUS` | + | `dev` | | `MS_NODEV` | + | `diratime` | | `MS_NODIRATIME` | + | `dirsync` | `MS_DIRSYNC` | | + | `exec` | | `MS_NOEXEC` | + | `iversion` | `MS_I_VERSION` | | + | `lazytime` | `MS_LAZYTIME` | | + | `loud` | | `MS_SILENT` | + | `mand` | `MS_MANDLOCK` | | + | `move` | `MS_MOVE` | | + | `noacl` | | `MS_POSIXACL` | + | `noatime` | `MS_NOATIME` | | + | `nodev` | `MS_NODEV` | | + | `nodiratime` | `MS_NODIRATIME` | | + | `noexec` | `MS_NOEXEC` | | + | `noiversion` | | `MS_I_VERSION` | + | `nolazytime` | | `MS_LAZYTIME` | + | `nomand` | | `MS_MANDLOCK` | + | `norelatime` | | `MS_RELATIME` | + | `nostrictatime` | | `MS_STRICTATIME` | + | `nosuid` | `MS_NOSUID` | | + | `private` | `MS_PRIVATE` | | + | `rbind` | `MS_REC | MS_BIND` | | + | `relatime` | `MS_RELATIME` | | + | `remount` | `MS_REMOUNT` | | + | `ro` | `MS_RDONLY` | | + | `rprivate` | `MS_REC | MS_PRIVATE` | | + | `rshared` | `MS_REC | MS_SHARED` | | + | `rslave` | `MS_REC | MS_SLAVE` | | + | `runbindable` | `MS_REC | MS_UNBINDABLE` | | + | `rw` | | `MS_RDONLY` | + | `shared` | `MS_SHARED` | | + | `silent` | `MS_SILENT` | | + | `slave` | `MS_SLAVE` | | + | `strictatime` | `MS_STRICTATIME` | | + | `suid` | | `MS_NOSUID` | + | `sync` | `MS_SYNCHRONOUS` | | + | `unbindable` | `MS_UNBINDABLE` | | + + Runtimes MUST call [`mount(2)`][mount.2] with the `data` pointing at a string containing, in the listed order, all options not listed in the above table joined with comma delimiters. + If `options` contains no options from the above table, `data` may be either a pointer to an empty string or `NULL`. + + * Solaris: corresponds to "options" of the fs resource in [zonecfg(1M)][zonecfg.1m]. ### Example (Linux) @@ -80,9 +135,8 @@ For Windows, see [mountvol][mountvol] and [SetVolumeMountPoint][set-volume-mount }, { "destination": "/data", - "type": "bind", "source": "/volumes/testing", - "options": ["rbind","rw"] + "options": ["rbind", "rw"] } ] ``` @@ -829,9 +883,6 @@ Here is a full example `config.json` for reference. [capabilities.7]: http://man7.org/linux/man-pages/man7/capabilities.7.html [mount.2]: http://man7.org/linux/man-pages/man2/mount.2.html -[mount.8]: http://man7.org/linux/man-pages/man8/mount.8.html -[mount.8-filesystem-independent]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%20OPTIONS -[mount.8-filesystem-specific]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-SPECIFIC_MOUNT%20OPTIONS [setrlimit.2]: http://man7.org/linux/man-pages/man2/setrlimit.2.html [stdin.3]: http://man7.org/linux/man-pages/man3/stdin.3.html [uts-namespace.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html diff --git a/specs-go/config.go b/specs-go/config.go index c9f7d67d1..c3f902d34 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -112,7 +112,7 @@ type Platform struct { type Mount struct { // Destination is the path where the mount will be placed relative to the container's root. The path and child directories MUST exist, a runtime MUST NOT create directories automatically to a mount point. Destination string `json:"destination"` - // Type specifies the mount kind. + // Type specifies the type of filesystem to mount. Type string `json:"type,omitempty"` // Source specifies the source path of the mount. In the case of bind mounts on // Linux based systems this would be the file on the host.