-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update "Windows Storage Support" KEP status to implementable and address prior feedback #1184
Conversation
/sig storage |
/assign @PatrickLang @msau42 @liggitt |
|
||
![Kubernetes CSI Node Components and Interactions](https://ddebroywin1.s3-us-west-2.amazonaws.com/csi-proxy2.png "Kubernetes CSI Node Components and Interactions") | ||
![Kubernetes CSI Node Components and Interactions](https://ddebroywin1.s3-us-west-2.amazonaws.com/csi-proxy3.png "Kubernetes CSI Node Components and Interactions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image can be checked into this directory directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/assign @calebamiles |
|
||
A GRPC based interface, CSIProxyService, will be used by CSI Node Plugins to invoke privileged operations on the host through csi-proxy.exe. CSIProxyService will be versioned and any release of csi-proxy.exe binary will strive to maintain backward compatibility across as many prior stable versions of the API as possible. This avoids having to run multiple versions of the csi-proxy.exe binary on the same Windows host if multiple CSI Node Plugins (that do not depend on APIs in Alpha or Beta versions) have been configured and the version of the csi-proxy API required by the plugins are different. For every version of the API supported, csi-proxy.exe will first probe for and then expose a `\\.\pipe\csi-proxy-v[N]` pipe where v[N] can be v1, v2alpha1, v3beta1, etc. If during the initial probe phase, csi-proxy.exe determines that another process is already listening on a `\\.\pipe\csi-proxy-v[N]` named pipe, it will not try to create and listen on that named pipe. This allows multiple versions of csi-proxy.exe to run side-by-side if absolutely necessary to support multiple CSI Node Plugins that require widely different versions of CSIProxyService that no single version of csi-proxy.exe can support. | ||
|
||
If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN. | ||
|
||
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API: | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the RPC messages described below (MkdirRequest, RmdirRequest, PartitionDisk, FormatVolume, LinkVolume, etc) pre-existing or introduced as part of this feature?
since this is exposing an API that allows things running inside containers to escalate permissions to do host-level operations, we need clear information about the following aspects:
- what values are allowed in paths
- are special characters restricted?
- are backstep path segments allowed?
- are forward and backslashes (and international path separators like ¥) allowed and normalized?
- which of the filepath variants in https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats are allowed?
- is the csi-proxy process or the caller responsible for normalization and validation of paths?
- do symlink traversal and time-of-check-time-of-use issues exist in windows? for example, could a container create a link in a filesystem it controlled that pointed to a host filepath it should not have access to, then request a CSI volume be mounted at a path beneath that link, and get the privileged csi-proxy to follow the link and overwrite/expose host files to the container? if so, how does the caller indicate to the csi-proxy what the acceptable boundary is for resolved paths that include links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible some or many of the validation bits are constrained by the CSI spec already and the kubelet is responsible for filtering out some of those things. If so, just noting that and ensuring test coverage of the kubelet is fine.
The symlink/link/traversal issues are more interesting, since the csi-proxy would be the only actor capable of performing those checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the RPCs mentioned are new. Will clarify the path semantics and validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than repeating everything in the doc - I think we can reference https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
@liggitt - that doc you linked to is a subset implemented in .Net. https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew is the native API that should be used to canonicalize file names that follows the rules listed in the prior doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for linking to existing references, clarifying which API is used, and being explicit about where normalization happens (which component(s) are responsible for doing it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path normalization on the node side can be done with https://github.com/golang/go/blob/2b8cbc384d092dc63e9dc18ec318d0682185611c/src/syscall/syscall_windows.go#L224
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the Request and Response struct descriptions for the RPCs along with restrictions. /cc @liggitt and @PatrickLang PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Do these map closely to windows filesystem APIs? If so, that would be good to reference.
I'll defer to the windows/storage experts to determine if these provide a reasonable API that covers the information/operations a CSI driver would need.
@@ -371,7 +379,7 @@ New YAMLs for DaemonSets associated with CSI Node Plugins needs to be authored t | |||
|
|||
### Risks and Mitigations | |||
|
|||
Any pod on a Windows node can be configured to mount `\\.\pipe\csi-proxy-v[N]` and perform privileged operations. Thus csi-proxy presents a potential security risk. To mitigate the risk, here are some options: | |||
Any pod on a Windows node can be configured to mount `\\.\pipe\csi-proxy-v[N]` and perform privileged operations. Thus csi-proxy presents a potential security risk. To mitigate the risk, some options are described below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there already things that could be mounted into pods that allowed arbitrary hostpath mounts that would allow similar escalations/capabilities? if so, the mere existence of the csi proxy isn't really an increased risk for those pods.
the increased risk comes from the csi-proxy handling of data plumbed to it by the csi drivers from restricted user-specified PVCs/Pods that were not allowed hostPath access, hence the data validation questions in https://github.com/kubernetes/enhancements/pull/1184/files#r308785333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows doesn't support container mount projection today, so there's no way to create new mounts that can be interpreted by the host.
cc @tallclair |
@@ -290,7 +296,7 @@ CSI Node Plugins need to execute certain privileged operations at the host level | |||
|
|||
Registration of CSI Node Plugins on a Kubernetes node is handled by the Kubelet plugin watcher using the fsnotify package. This component needs to convert paths detected by fsnotify to Windows paths in handleCreateEvent() and handleDeleteEvent() before the paths are passed to AddOrUpdatePlugin() RemovePlugin() routines in desiredStateOfTheWorld. A new utility function, NormalizeWindowsPath(), will be added in utils to handle this. | |||
|
|||
Given `os.ModeSocket` is not set on a socket file's `os.FileMode` in Windows, a specific check for `os.ModeSocket` in handleCreateEvent() will need to be relaxed. | |||
Given `os.ModeSocket` is not set on a socket file's `os.FileMode` in Windows (due to golang [issue](https://github.com/golang/go/issues/33357)), a specific check for `os.ModeSocket` in handleCreateEvent() will need to be relaxed for Windows until the golang issue is addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an alternative check that can be done on windows to avoid spawning lots of events for normal files/dirs (like ensuring it is not a directory, and is not a regular file)? I know there were bugs with the plugin manager getting overwhelmed trying to process change events for non-socket files under that dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code fragment I am referring to that needs to be relaxed is: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pluginmanager/pluginwatcher/plugin_watcher.go#L193-L196
I think the rest of the code around the above does get invoked regardless of file type - so I don't think the invocation of handleCreateEvent
itself is being filtered based on file type but the downstream calls like handlePluginRegistration
are filtered.
We can explore sending down a FSCTL to the file to check for the Windows reparse points (through go wrappers) to align with fi.Mode()&os.ModeSocket == 0
. I can call that plan out here.
CONTAINER = 1; | ||
} | ||
|
||
message MkdirRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what owner/permissions are associated with the new directory? are all parent directories created if missing as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added a Windows OS error code for all responses as that is critical. Also clarified paths will not be pre-created for parents and a PathExists RPC. Also mentioned the typical user SID under which csi-proxy.exe will be started and the permissions (read, write) that will be set when creating directories.
// Restrictions: | ||
// Needs to be an absolute path under kubernetes-csi-plugins-path | ||
// or container-runtime-storage-path based on context | ||
// Cannot exist already in host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it does exist already, is an error returned? is the error distinguishable as an "already exists" error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I went ahead and added a Windows OS error code for all responses to make error detection and reaction to them easier.
|
||
A GRPC based interface, CSIProxyService, will be used by CSI Node Plugins to invoke privileged operations on the host through csi-proxy.exe. CSIProxyService will be versioned and any release of csi-proxy.exe binary will strive to maintain backward compatibility across as many prior stable versions of the API as possible. This avoids having to run multiple versions of the csi-proxy.exe binary on the same Windows host if multiple CSI Node Plugins (that do not depend on APIs in Alpha or Beta versions) have been configured and the version of the csi-proxy API required by the plugins are different. For every version of the API supported, csi-proxy.exe will first probe for and then expose a `\\.\pipe\csi-proxy-v[N]` pipe where v[N] can be v1, v2alpha1, v3beta1, etc. If during the initial probe phase, csi-proxy.exe determines that another process is already listening on a `\\.\pipe\csi-proxy-v[N]` named pipe, it will not try to create and listen on that named pipe. This allows multiple versions of csi-proxy.exe to run side-by-side if absolutely necessary to support multiple CSI Node Plugins that require widely different versions of CSIProxyService that no single version of csi-proxy.exe can support. | ||
|
||
If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN. | ||
|
||
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API: | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Do these map closely to windows filesystem APIs? If so, that would be good to reference.
I'll defer to the windows/storage experts to determine if these provide a reasonable API that covers the information/operations a CSI driver would need.
If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN. | ||
##### CSI Proxy Configuration | ||
There will be two command line parameters that may be passed to csi-proxy.exe: | ||
1. kubernetes-csi-plugins-path: String parameter pointing to the path used by Kubernetes CSI plugins. Will default to: `C:\var\lib\kubelet\plugins\kubernetes.io\csi`. All requests for creation and deletion of paths through the CSIProxyService RPCs (detailed below) will need to be under this path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to be under this path
Including resolution of any links in the parent dir?
For example, in the following setup:
kubernetes-csi-plugins-path
=C:\var\lib\kubelet\plugins\kubernetes.io\csi
C:\var\lib\kubelet\plugins\kubernetes.io\csi\foo\bar
is a link that points toZ:\baz
Would a request to delete C:\var\lib\kubelet\plugins\kubernetes.io\csi\foo\bar\quux
be allowed (since it really resolves to Z:\baz\quux
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 is more familiar with what the CSI drivers are required to do, and how much visibility (if any) they have to the mountPath/subPath inputs the user provides. The less direct input from untrusted pvc/pod creators the drivers make use of, the less concerned I am about these aspects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/var/lib/kubelet/plugins/kubernetes.io/csi is where the staging mounts go, but I think we also need /var/lib/kubelet/pods/ for the per pod link right?
Regarding @liggitt's concern, I think we should disallow symlinks for the paths that get passed to the proxy.
Kubelet is supposed to give the driver only system-controlled paths, but that doesn't stop the driver from giving another path to the proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSI drivers don't handle subpaths. It's handled directly by the kubelet, and IIUC, this proxy is only for CSI driver file/mount operations, not kubelet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with windows symlink mechanics. Does it have the same issue as linux where O_NOFOLLOW only applies to the past element of the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 @liggitt I specifically mentioned for RmdirRequest
: Path cannot be a file of type symlink
as a restriction https://github.com/kubernetes/enhancements/pull/1184/files#diff-0533ba89d026317907f55974130ce267R419 . The file type of the path will be checked and request to delete a path of type symlink will be rejected. So that should take care of rmdir being invoked on a symlink path to remove a target somewhere else.
For MkdirRequest
, the proxy creates the path and will make sure they are regular directories and not symlinks.
2. container-runtime-storage-path: String parameter pointing to the path used by the container runtime to store images, snapshots and other data related to containers. Possible values could be: `F:\ProgramData\docker`, `C:\ProgramData\containerd` or corresponding paths used by a Windows runtime. Parameters to `LinkVolume` (detailed below) will need to be under this path. | ||
|
||
##### CSI Proxy GRPC API | ||
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API. A preliminary structure for Request and Response associated with each RPC is described below. Note that the specific structures as well as restrictions on them are expected to evolve during Alpha phase and are expected to be in a final form at the end of Beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API
isn't what is described below the v1alpha1 version of the CSIProxyService API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I will update this.
/assign |
Signed-off-by: Deep Debroy <ddebroy@docker.com>
- [CSI Proxy Configuration](#csi-proxy-configuration) | ||
- [CSI Proxy GRPC API](#csi-proxy-grpc-api) | ||
- [CSI Proxy GRPC API Graduation and Deprecation Policy](#csi-proxy-grpc-api-graduation-and-deprecation-policy) | ||
- [CSI Proxy Event Logs](#csi-proxy-event-logs) | ||
- [Enhancements in Kubernetes/Utils/mounter](#enhancements-in-kubernetesutilsmounter) | ||
- [Enhancements in CSI Node Plugins](#enhancements-in-csi-node-plugins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you define CSI Node Plugins? Maybe in a glossary section? I wasn't sure if this is referring to the CSI driver to be installed on nodes or the DaemonSet or something else without reading deeper
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ddebroy, PatrickLang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Well that was strange - tide feels the PR has not been approved but the k8s-ci-robot added approved tag and merged the PR? |
|
||
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API: | ||
2. container-runtime-storage-path: String parameter pointing to the path used by the container runtime to store images, snapshots and other data related to containers. Possible values could be: `F:\ProgramData\docker`, `C:\ProgramData\containerd` or corresponding paths used by a Windows runtime. Parameters to `LinkVolume` (detailed below) will need to be under this path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do CSI drivers need the container runtime paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the LinkVolume step where the proxy needs to invoke mklink
from the host staging path to a path inside the container's fs as part of node publish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a little bit more? I assume CSI plugin only needs to know the /var/lib/kubelet
paths, and the container runtime will be the one to handle paths for the container rootfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran through the flow again and what I stated above is incorrect. The node publish path on the container side is indeed under C:\var\lib\kubelet\pods. I will correct this in a fresh PR.
Updates: