diff --git a/content/en/docs/concepts/scheduling-eviction/kube-scheduler.md b/content/en/docs/concepts/scheduling-eviction/kube-scheduler.md index c95f021101794..0a94a779a3ea0 100644 --- a/content/en/docs/concepts/scheduling-eviction/kube-scheduler.md +++ b/content/en/docs/concepts/scheduling-eviction/kube-scheduler.md @@ -95,3 +95,7 @@ of the scheduler: * Learn about [configuring multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/) * Learn about [topology management policies](/docs/tasks/administer-cluster/topology-manager/) * Learn about [Pod Overhead](/docs/concepts/configuration/pod-overhead/) +* Learn about scheduling of Pods that use volumes in: + * [Volume Topology Support](/docs/concepts/storage/storage-classes/#volume-binding-mode) + * [Storage Capacity Tracking](/docs/concepts/storage/storage-capacity/) + * [Node-specific Volume Limits](/docs/concepts/storage/storage-limits/) diff --git a/content/en/docs/concepts/storage/storage-capacity.md b/content/en/docs/concepts/storage/storage-capacity.md new file mode 100644 index 0000000000000..5e6e4db74bf0c --- /dev/null +++ b/content/en/docs/concepts/storage/storage-capacity.md @@ -0,0 +1,134 @@ +--- +reviewers: +- jsafrane +- saad-ali +- msau42 +- xing-yang +- pohly +title: Storage Capacity +content_type: concept +weight: 45 +--- + + + +Storage capacity is limited and may vary depending on the node on +which a pod runs: network-attached storage might not be accessible by +all nodes, or storage is local to a node to begin with. + +{{< feature-state for_k8s_version="v1.19" state="alpha" >}} + +This page describes how Kubernetes keeps track of storage capacity and +how the scheduler uses that information to schedule Pods onto nodes +that have access to enough storage capacity for the remaining missing +volumes. Without storage capacity tracking, it is random whether the +selected node has enough storage for the Pod and multiple scheduling retries may be +needed. + +Tracking storage capacity is supported for [{{< glossary_tooltip +text="Container Storage Interface" term_id="csi" >}} (CSI) drivers](https://kubernetes-csi.github.io/docs/drivers.html) and +[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver. + + + +## API + +There are two API extensions for this feature: +- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) objects: + these get produced by a CSI driver in the namespace + where the driver is installed. Each object contains capacity + information for one storage class and defines which nodes have + access to that storage. +- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io): + when set to `true`, the Kubernetes scheduler will consider storage + capacity for volumes that use the CSI driver. + +## Scheduling + +Storage capacity information is used by the Kubernetes scheduler if: +- the `CSIStorageCapacity` feature gate is true, +- a Pod uses a volume that has not been created yet, +- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and + uses `WaitForFirstConsumer` [volume binding + mode](/docs/concepts/storage/storage-classes/#volume-binding-mode), + and +- the `CSIDriver` object for the driver has `StorageCapacity` set to + true. + +In that case, the scheduler only considers nodes for the Pod which +have enough storage available to them. This check is very +simplistic and only compares the size of the volume against the +capacity listed in `CSIStorageCapacity` objects with a topology that +includes the node. + +For volumes with `Immediate` volume binding mode, the storage driver +decides where to create the volume, independently of Pods that will +use the volume. The scheduler then schedules Pods onto nodes where the +volume is available after the volume has been created. + +For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi), +scheduling always happens without considering storage capacity. This +is based on the assumption that this volume type is only used by +special CSI drivers which are local to a node and do not need +significant resources there. + +## Rescheduling + +When a node has been selected for a Pod with `WaitForFirstConsumer` +volumes, that decision is still tentative. The next step is that the +CSI storage driver gets asked to create the volume with a hint that the +volume is supposed to be available on the selected node. + +Because Kubernetes might have chosen a node based on out-dated +capacity information, it is possible that the volume cannot really be +created. The node selection is then reset and the Kubernetes scheduler +tries again to find a node for the Pod. + +## Limitations + +Storage capacity tracking increases the chance that scheduling works +on the first try, but cannot guarantee this because the scheduler has +to decide based on potentially out-dated information. Usually, the +same retry mechanism as for scheduling without any storage capacity +information handles scheduling failures. + +One situation where scheduling can fail permanently is when a Pod uses +multiple volumes: one volume might have been created already in a +topology segment which then does not have enough capacity left for +another volume. Manual intervention is necessary to recover from this, +for example by increasing capacity or deleting the volume that was +already created. [Further +work](https://github.com/kubernetes/enhancements/pull/1703) is needed +to handle this automatically. + +## Enabling storage capacity tracking + +Storage capacity tracking is an *alpha feature* and only enabled when +the `CSIStorageCapacity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled. A quick check +whether a Kubernetes cluster supports the feature is to list +CSIStorageCapacity objects with: +```shell +kubectl get csistoragecapacities --all-namespaces +``` + +If your cluster supports CSIStorageCapacity, the response is either a list of CSIStorageCapacity objects or: +``` +No resources found +``` + +If not supported, this error is printed instead: +``` +error: the server doesn't have a resource type "csistoragecapacities" +``` + +In addition to enabling the feature in the cluster, a CSI +driver also has to +support it. Please refer to the driver's documentation for +details. + +## {{% heading "whatsnext" %}} + + - For more information on the design, see the +[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). +- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). +- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index af248018de61c..123dc53e57714 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -76,6 +76,7 @@ different Kubernetes components. | `CSIMigrationGCEComplete` | `false` | Alpha | 1.17 | | | `CSIMigrationOpenStack` | `false` | Alpha | 1.14 | | | `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | | +| `CSIStorageCapacity` | `false` | Alpha | 1.19 | | | `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | | | `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | | | `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 | @@ -390,6 +391,7 @@ Each feature gate is designed for enabling/disabling a specific feature: - `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a [CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md) compatible volume plugin. +- `CSIStorageCapacity`: Enables CSI drivers to publish storage capacity information and the Kubernetes scheduler to use that information when scheduling pods. See [Storage Capacity](/docs/concepts/storage/storage-capacity/). Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details. - `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod. - `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.