Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "Windows Storage Support" KEP status to implementable and address prior feedback #1184

Merged
merged 1 commit into from
Jul 30, 2019

Conversation

ddebroy
Copy link
Member

@ddebroy ddebroy commented Jul 30, 2019

Updates:

  1. Clarify deprecation policy for API members.
  2. Add pointer to golang issue to track supporting detection of domain socket files in Windows.
  3. Compare Linux Node Plugin attack surface to CSI Proxy.
  4. Update diagram and API members from Get => List.
  5. Specify generation of eventlog entries that can be used as an audit trail.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/windows Categorizes an issue or PR as relevant to SIG Windows. labels Jul 30, 2019
@ddebroy
Copy link
Member Author

ddebroy commented Jul 30, 2019

/sig storage

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jul 30, 2019
@ddebroy
Copy link
Member Author

ddebroy commented Jul 30, 2019

/assign @PatrickLang @msau42 @liggitt


![Kubernetes CSI Node Components and Interactions](https://ddebroywin1.s3-us-west-2.amazonaws.com/csi-proxy2.png "Kubernetes CSI Node Components and Interactions")
![Kubernetes CSI Node Components and Interactions](https://ddebroywin1.s3-us-west-2.amazonaws.com/csi-proxy3.png "Kubernetes CSI Node Components and Interactions")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image can be checked into this directory directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ddebroy
Copy link
Member Author

ddebroy commented Jul 30, 2019

/assign @calebamiles


A GRPC based interface, CSIProxyService, will be used by CSI Node Plugins to invoke privileged operations on the host through csi-proxy.exe. CSIProxyService will be versioned and any release of csi-proxy.exe binary will strive to maintain backward compatibility across as many prior stable versions of the API as possible. This avoids having to run multiple versions of the csi-proxy.exe binary on the same Windows host if multiple CSI Node Plugins (that do not depend on APIs in Alpha or Beta versions) have been configured and the version of the csi-proxy API required by the plugins are different. For every version of the API supported, csi-proxy.exe will first probe for and then expose a `\\.\pipe\csi-proxy-v[N]` pipe where v[N] can be v1, v2alpha1, v3beta1, etc. If during the initial probe phase, csi-proxy.exe determines that another process is already listening on a `\\.\pipe\csi-proxy-v[N]` named pipe, it will not try to create and listen on that named pipe. This allows multiple versions of csi-proxy.exe to run side-by-side if absolutely necessary to support multiple CSI Node Plugins that require widely different versions of CSIProxyService that no single version of csi-proxy.exe can support.

If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN.

The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API:

```
Copy link
Member

@liggitt liggitt Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the RPC messages described below (MkdirRequest, RmdirRequest, PartitionDisk, FormatVolume, LinkVolume, etc) pre-existing or introduced as part of this feature?

since this is exposing an API that allows things running inside containers to escalate permissions to do host-level operations, we need clear information about the following aspects:

  • what values are allowed in paths
  • is the csi-proxy process or the caller responsible for normalization and validation of paths?
  • do symlink traversal and time-of-check-time-of-use issues exist in windows? for example, could a container create a link in a filesystem it controlled that pointed to a host filepath it should not have access to, then request a CSI volume be mounted at a path beneath that link, and get the privileged csi-proxy to follow the link and overwrite/expose host files to the container? if so, how does the caller indicate to the csi-proxy what the acceptable boundary is for resolved paths that include links?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible some or many of the validation bits are constrained by the CSI spec already and the kubelet is responsible for filtering out some of those things. If so, just noting that and ensuring test coverage of the kubelet is fine.

The symlink/link/traversal issues are more interesting, since the csi-proxy would be the only actor capable of performing those checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the RPCs mentioned are new. Will clarify the path semantics and validations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than repeating everything in the doc - I think we can reference https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

@liggitt - that doc you linked to is a subset implemented in .Net. https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew is the native API that should be used to canonicalize file names that follows the rules listed in the prior doc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for linking to existing references, clarifying which API is used, and being explicit about where normalization happens (which component(s) are responsible for doing it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ddebroy ddebroy Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the Request and Response struct descriptions for the RPCs along with restrictions. /cc @liggitt and @PatrickLang PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Do these map closely to windows filesystem APIs? If so, that would be good to reference.

I'll defer to the windows/storage experts to determine if these provide a reasonable API that covers the information/operations a CSI driver would need.

@@ -371,7 +379,7 @@ New YAMLs for DaemonSets associated with CSI Node Plugins needs to be authored t

### Risks and Mitigations

Any pod on a Windows node can be configured to mount `\\.\pipe\csi-proxy-v[N]` and perform privileged operations. Thus csi-proxy presents a potential security risk. To mitigate the risk, here are some options:
Any pod on a Windows node can be configured to mount `\\.\pipe\csi-proxy-v[N]` and perform privileged operations. Thus csi-proxy presents a potential security risk. To mitigate the risk, some options are described below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there already things that could be mounted into pods that allowed arbitrary hostpath mounts that would allow similar escalations/capabilities? if so, the mere existence of the csi proxy isn't really an increased risk for those pods.

the increased risk comes from the csi-proxy handling of data plumbed to it by the csi drivers from restricted user-specified PVCs/Pods that were not allowed hostPath access, hence the data validation questions in https://github.com/kubernetes/enhancements/pull/1184/files#r308785333

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows doesn't support container mount projection today, so there's no way to create new mounts that can be interpreted by the host.

@msau42
Copy link
Member

msau42 commented Jul 30, 2019

cc @tallclair

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 30, 2019
@@ -290,7 +296,7 @@ CSI Node Plugins need to execute certain privileged operations at the host level

Registration of CSI Node Plugins on a Kubernetes node is handled by the Kubelet plugin watcher using the fsnotify package. This component needs to convert paths detected by fsnotify to Windows paths in handleCreateEvent() and handleDeleteEvent() before the paths are passed to AddOrUpdatePlugin() RemovePlugin() routines in desiredStateOfTheWorld. A new utility function, NormalizeWindowsPath(), will be added in utils to handle this.

Given `os.ModeSocket` is not set on a socket file's `os.FileMode` in Windows, a specific check for `os.ModeSocket` in handleCreateEvent() will need to be relaxed.
Given `os.ModeSocket` is not set on a socket file's `os.FileMode` in Windows (due to golang [issue](https://github.com/golang/go/issues/33357)), a specific check for `os.ModeSocket` in handleCreateEvent() will need to be relaxed for Windows until the golang issue is addressed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an alternative check that can be done on windows to avoid spawning lots of events for normal files/dirs (like ensuring it is not a directory, and is not a regular file)? I know there were bugs with the plugin manager getting overwhelmed trying to process change events for non-socket files under that dir.

Copy link
Member Author

@ddebroy ddebroy Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code fragment I am referring to that needs to be relaxed is: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pluginmanager/pluginwatcher/plugin_watcher.go#L193-L196

I think the rest of the code around the above does get invoked regardless of file type - so I don't think the invocation of handleCreateEvent itself is being filtered based on file type but the downstream calls like handlePluginRegistration are filtered.

We can explore sending down a FSCTL to the file to check for the Windows reparse points (through go wrappers) to align with fi.Mode()&os.ModeSocket == 0. I can call that plan out here.

CONTAINER = 1;
}

message MkdirRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what owner/permissions are associated with the new directory? are all parent directories created if missing as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added a Windows OS error code for all responses as that is critical. Also clarified paths will not be pre-created for parents and a PathExists RPC. Also mentioned the typical user SID under which csi-proxy.exe will be started and the permissions (read, write) that will be set when creating directories.

// Restrictions:
// Needs to be an absolute path under kubernetes-csi-plugins-path
// or container-runtime-storage-path based on context
// Cannot exist already in host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it does exist already, is an error returned? is the error distinguishable as an "already exists" error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I went ahead and added a Windows OS error code for all responses to make error detection and reaction to them easier.


A GRPC based interface, CSIProxyService, will be used by CSI Node Plugins to invoke privileged operations on the host through csi-proxy.exe. CSIProxyService will be versioned and any release of csi-proxy.exe binary will strive to maintain backward compatibility across as many prior stable versions of the API as possible. This avoids having to run multiple versions of the csi-proxy.exe binary on the same Windows host if multiple CSI Node Plugins (that do not depend on APIs in Alpha or Beta versions) have been configured and the version of the csi-proxy API required by the plugins are different. For every version of the API supported, csi-proxy.exe will first probe for and then expose a `\\.\pipe\csi-proxy-v[N]` pipe where v[N] can be v1, v2alpha1, v3beta1, etc. If during the initial probe phase, csi-proxy.exe determines that another process is already listening on a `\\.\pipe\csi-proxy-v[N]` named pipe, it will not try to create and listen on that named pipe. This allows multiple versions of csi-proxy.exe to run side-by-side if absolutely necessary to support multiple CSI Node Plugins that require widely different versions of CSIProxyService that no single version of csi-proxy.exe can support.

If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN.

The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API:

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Do these map closely to windows filesystem APIs? If so, that would be good to reference.

I'll defer to the windows/storage experts to determine if these provide a reasonable API that covers the information/operations a CSI driver would need.

If an old stable version of CSIProxyService, say vN, can no longer be supported, support for it may be dropped from the latest version of csi-proxy.exe. To continue running CSI Node Plugins that depend on the old version vN, Kubernetes administrators will be required to run the latest version of the csi-proxy.exe (that will be used by CSI Node Plugins that use versions of CSIProxyService more recent than vN) along with an old version of csi-proxy.exe that does support vN.
##### CSI Proxy Configuration
There will be two command line parameters that may be passed to csi-proxy.exe:
1. kubernetes-csi-plugins-path: String parameter pointing to the path used by Kubernetes CSI plugins. Will default to: `C:\var\lib\kubelet\plugins\kubernetes.io\csi`. All requests for creation and deletion of paths through the CSIProxyService RPCs (detailed below) will need to be under this path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to be under this path

Including resolution of any links in the parent dir?

For example, in the following setup:

  1. kubernetes-csi-plugins-path = C:\var\lib\kubelet\plugins\kubernetes.io\csi
  2. C:\var\lib\kubelet\plugins\kubernetes.io\csi\foo\bar is a link that points to Z:\baz

Would a request to delete C:\var\lib\kubelet\plugins\kubernetes.io\csi\foo\bar\quux be allowed (since it really resolves to Z:\baz\quux?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 is more familiar with what the CSI drivers are required to do, and how much visibility (if any) they have to the mountPath/subPath inputs the user provides. The less direct input from untrusted pvc/pod creators the drivers make use of, the less concerned I am about these aspects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/var/lib/kubelet/plugins/kubernetes.io/csi is where the staging mounts go, but I think we also need /var/lib/kubelet/pods/ for the per pod link right?

Regarding @liggitt's concern, I think we should disallow symlinks for the paths that get passed to the proxy.

Kubelet is supposed to give the driver only system-controlled paths, but that doesn't stop the driver from giving another path to the proxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI drivers don't handle subpaths. It's handled directly by the kubelet, and IIUC, this proxy is only for CSI driver file/mount operations, not kubelet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with windows symlink mechanics. Does it have the same issue as linux where O_NOFOLLOW only applies to the past element of the path?

Copy link
Member Author

@ddebroy ddebroy Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 @liggitt I specifically mentioned for RmdirRequest: Path cannot be a file of type symlink as a restriction https://github.com/kubernetes/enhancements/pull/1184/files#diff-0533ba89d026317907f55974130ce267R419 . The file type of the path will be checked and request to delete a path of type symlink will be rejected. So that should take care of rmdir being invoked on a symlink path to remove a target somewhere else.

For MkdirRequest, the proxy creates the path and will make sure they are regular directories and not symlinks.

2. container-runtime-storage-path: String parameter pointing to the path used by the container runtime to store images, snapshots and other data related to containers. Possible values could be: `F:\ProgramData\docker`, `C:\ProgramData\containerd` or corresponding paths used by a Windows runtime. Parameters to `LinkVolume` (detailed below) will need to be under this path.

##### CSI Proxy GRPC API
The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API. A preliminary structure for Request and Response associated with each RPC is described below. Note that the specific structures as well as restrictions on them are expected to evolve during Alpha phase and are expected to be in a final form at the end of Beta.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API

isn't what is described below the v1alpha1 version of the CSIProxyService API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I will update this.

@saad-ali saad-ali changed the title Update KEP status to implementable and address prior feedback Update "Windows Storage Support" KEP status to implementable and address prior feedback Jul 30, 2019
@saad-ali
Copy link
Member

/assign

Signed-off-by: Deep Debroy <ddebroy@docker.com>
- [CSI Proxy Configuration](#csi-proxy-configuration)
- [CSI Proxy GRPC API](#csi-proxy-grpc-api)
- [CSI Proxy GRPC API Graduation and Deprecation Policy](#csi-proxy-grpc-api-graduation-and-deprecation-policy)
- [CSI Proxy Event Logs](#csi-proxy-event-logs)
- [Enhancements in Kubernetes/Utils/mounter](#enhancements-in-kubernetesutilsmounter)
- [Enhancements in CSI Node Plugins](#enhancements-in-csi-node-plugins)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you define CSI Node Plugins? Maybe in a glossary section? I wasn't sure if this is referring to the CSI driver to be installed on nodes or the DaemonSet or something else without reading deeper

@PatrickLang
Copy link
Contributor

/lgtm
looking for approval from a sig-storage rep

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ddebroy, PatrickLang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4eee22d into kubernetes:master Jul 30, 2019
@ddebroy
Copy link
Member Author

ddebroy commented Jul 30, 2019

Well that was strange - tide feels the PR has not been approved but the k8s-ci-robot added approved tag and merged the PR?


The following are the main RPC calls that will comprise a v1 version of the CSIProxyService API:
2. container-runtime-storage-path: String parameter pointing to the path used by the container runtime to store images, snapshots and other data related to containers. Possible values could be: `F:\ProgramData\docker`, `C:\ProgramData\containerd` or corresponding paths used by a Windows runtime. Parameters to `LinkVolume` (detailed below) will need to be under this path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do CSI drivers need the container runtime paths?

Copy link
Member Author

@ddebroy ddebroy Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the LinkVolume step where the proxy needs to invoke mklink from the host staging path to a path inside the container's fs as part of node publish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a little bit more? I assume CSI plugin only needs to know the /var/lib/kubelet paths, and the container runtime will be the one to handle paths for the container rootfs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran through the flow again and what I stated above is incorrect. The node publish path on the container side is indeed under C:\var\lib\kubelet\pods. I will correct this in a fresh PR.

@justaugustus justaugustus added area/enhancements Issues or PRs related to the Enhancements subproject and removed sig/pm area/enhancements Issues or PRs related to the Enhancements subproject labels Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants