-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is the initial documentation for one new feature: - kubernetes/enhancements#1472
- Loading branch information
Showing
2 changed files
with
110 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
--- | ||
reviewers: | ||
- jsafrane | ||
- saad-ali | ||
- msau42 | ||
- xing-yang | ||
- pohly | ||
title: Storage Capacity | ||
content_type: concept | ||
--- | ||
|
||
<!-- overview --> | ||
|
||
Storage capacity is limited and may vary depending on the node on | ||
which a pod runs: network-attached storage might not be accessible by | ||
all nodes, or storage is local to a node to begin with. | ||
|
||
This page describes how Kubernetes keeps track of storage capacity and | ||
how the scheduler uses that information to schedule pods. | ||
|
||
<!-- body --> | ||
|
||
|
||
## Enabling the feature | ||
|
||
Storage capacity tracking is an *alpha feature* and only enabled when | ||
the `CSIStorageCapacity` feature gate is enabled. A quick check | ||
whether a Kubernetes cluster supports the feature is to list | ||
`CSIStorageCapacity` objects with: | ||
```shell | ||
kubectl get csistoragecapacities --all-namespaces | ||
``` | ||
|
||
If supported, the response will a list of objects or: | ||
``` | ||
No resources found | ||
``` | ||
|
||
If not supported, this error is printed instead: | ||
``` | ||
error: the server doesn't have a resource type "csistoragecapacities" | ||
``` | ||
|
||
In addition to enabling the feature in the cluster, a CSI driver | ||
deployment also has to support it. Please refer to the driver's | ||
documentation for details. Without this support, there will be no | ||
information about storage capacity available through the driver and | ||
the scheduler will schedule pods with volumes provided by the driver | ||
without looking for capacity information. | ||
|
||
## API | ||
|
||
There are two API extensions for this feature: | ||
- [`CSIStorageCapacity` objects](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#csistoragecapacity-v1alpha1-storage-k8s-io): these get produced by a CSI driver in the namespace | ||
where the driver is installed. Each object contains capacity | ||
information for one storage class and defines which nodes have | ||
access to that storage. | ||
- [The `CSIDriverSpec.StorageCapacity` field](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#csidriverspec-v1-storage-k8s-io): when | ||
set to `true`, the Kubernetes scheduler will consider storage | ||
capacity for volumes that use the CSI driver. | ||
|
||
## Scheduling | ||
|
||
Storage capacity information is used by the Kubernetes scheduler if: | ||
- the `CSIStorageCapacity` feature gate is true, | ||
- a pod uses a volume that has not been created yet, | ||
- that volume uses a storage class which references a CSI driver and | ||
uses [`WaitForFirstConsumer` volume binding | ||
mode](/docs/concepts/storage/storage-classes/#volume-binding-mode), | ||
and | ||
- the `CSIDriver` object for the driver has `StorageCapacity` set to | ||
true. | ||
|
||
In that case, the scheduler only considers nodes for the pod which | ||
have enough storage available to them. This check is currently very | ||
simplistic and only compares the size of the volume against the | ||
capacity listed in `CSIStorageCapacity` objects with a topology that | ||
includes the node. Without storage capacity tracking, nodes are picked | ||
without this check. | ||
|
||
For volumes with `Immediate` volume binding mode, the storage driver | ||
decides where to create the volume, independently of pods that will | ||
use the volume. The scheduler then schedules pods onto nodes where the | ||
volume is available after the volume has been created. | ||
|
||
For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi), | ||
scheduling always happens without considering storage capacity. This | ||
is based on the assumption that this volume type is only used by | ||
special CSI drivers which are local to a node and do not need | ||
significant resources there. | ||
|
||
## Rescheduling | ||
|
||
When a node has been selected for a pod with `WaitForFirstConsumer` | ||
volumes, that decision is still tentative. The next step is that the | ||
CSI storage driver gets asked to create the volume with a hint that the | ||
volume is supposed to be available on the selected node. | ||
|
||
Because Kubernetes might have chosen a node based on out-dated | ||
capacity information, it is possible that the volume cannot really be | ||
created. The node selection is then reset and the Kubernetes scheduler | ||
tries again to find a node for the pod. | ||
|
||
## {{% heading "whatsnext" %}} | ||
|
||
- For more information on the design, see the | ||
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). | ||
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters