From f188ed25eaa52933e91a8efbc72ad39c4b4dbd92 Mon Sep 17 00:00:00 2001 From: Patrick Ohly Date: Wed, 10 Jun 2020 15:47:31 +0200 Subject: [PATCH] storage: CSIStorageCapacity This is the initial documentation for one new feature: - https://github.com/kubernetes/enhancements/issues/1472 --- .../docs/concepts/storage/storage-capacity.md | 108 ++++++++++++++++++ .../feature-gates.md | 2 + 2 files changed, 110 insertions(+) create mode 100644 content/en/docs/concepts/storage/storage-capacity.md diff --git a/content/en/docs/concepts/storage/storage-capacity.md b/content/en/docs/concepts/storage/storage-capacity.md new file mode 100644 index 0000000000000..2fde0205ee5f3 --- /dev/null +++ b/content/en/docs/concepts/storage/storage-capacity.md @@ -0,0 +1,108 @@ +--- +reviewers: +- jsafrane +- saad-ali +- msau42 +- xing-yang +- pohly +title: Storage Capacity +content_type: concept +--- + + + +Storage capacity is limited and may vary depending on the node on +which a pod runs: network-attached storage might not be accessible by +all nodes, or storage is local to a node to begin with. + +This page describes how Kubernetes keeps track of storage capacity and +how the scheduler uses that information to schedule pods. + + + + +## Enabling the feature + +Storage capacity tracking is an *alpha feature* and only enabled when +the `CSIStorageCapacity` feature gate is enabled. A quick check +whether a Kubernetes cluster supports the feature is to list +`CSIStorageCapacity` objects with: +```shell +kubectl get csistoragecapacities --all-namespaces +``` + +If supported, the response will a list of objects or: +``` +No resources found +``` + +If not supported, this error is printed instead: +``` +error: the server doesn't have a resource type "csistoragecapacities" +``` + +In addition to enabling the feature in the cluster, a CSI driver +deployment also has to support it. Please refer to the driver's +documentation for details. Without this support, there will be no +information about storage capacity available through the driver and +the scheduler will schedule pods with volumes provided by the driver +without looking for capacity information. + +## API + +There are two API extensions for this feature: +- [`CSIStorageCapacity` objects](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#csistoragecapacity-v1alpha1-storage-k8s-io): these get produced by a CSI driver in the namespace + where the driver is installed. Each object contains capacity + information for one storage class and defines which nodes have + access to that storage. +- [The `CSIDriverSpec.StorageCapacity` field](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#csidriverspec-v1-storage-k8s-io): when + set to `true`, the Kubernetes scheduler will consider storage + capacity for volumes that use the CSI driver. + +## Scheduling + +Storage capacity information is used by the Kubernetes scheduler if: +- the `CSIStorageCapacity` feature gate is true, +- a pod uses a volume that has not been created yet, +- that volume uses a storage class which references a CSI driver and + uses [`WaitForFirstConsumer` volume binding + mode](/docs/concepts/storage/storage-classes/#volume-binding-mode), + and +- the `CSIDriver` object for the driver has `StorageCapacity` set to + true. + +In that case, the scheduler only considers nodes for the pod which +have enough storage available to them. This check is currently very +simplistic and only compares the size of the volume against the +capacity listed in `CSIStorageCapacity` objects with a topology that +includes the node. Without storage capacity tracking, nodes are picked +without this check. + +For volumes with `Immediate` volume binding mode, the storage driver +decides where to create the volume, independently of pods that will +use the volume. The scheduler then schedules pods onto nodes where the +volume is available after the volume has been created. + +For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi), +scheduling always happens without considering storage capacity. This +is based on the assumption that this volume type is only used by +special CSI drivers which are local to a node and do not need +significant resources there. + +## Rescheduling + +When a node has been selected for a pod with `WaitForFirstConsumer` +volumes, that decision is still tentative. The next step is that the +CSI storage driver gets asked to create the volume with a hint that the +volume is supposed to be available on the selected node. + +Because Kubernetes might have chosen a node based on out-dated +capacity information, it is possible that the volume cannot really be +created. The node selection is then reset and the Kubernetes scheduler +tries again to find a node for the pod. + +## {{% heading "whatsnext" %}} + + - For more information on the design, see the +[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). +- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 5422fb92025e8..77a25f980965c 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -76,6 +76,7 @@ different Kubernetes components. | `CSIMigrationGCEComplete` | `false` | Alpha | 1.17 | | | `CSIMigrationOpenStack` | `false` | Alpha | 1.14 | | | `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | | +| `CSIStorageCapacity` | `false` | Alpha | 1.19 | | | `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | | | `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | | | `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 | @@ -388,6 +389,7 @@ Each feature gate is designed for enabling/disabling a specific feature: - `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a [CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md) compatible volume plugin. +- `CSIStorageCapacity`: Enables CSI drivers to publish storage capacity information and the Kubernetes scheduler to use that information when scheduling pods. See [Storage Capacity](/docs/concepts/storage/storage-capacity/). Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details. - `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod. - `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.