-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is the initial documentation for one new feature: - kubernetes/enhancements#1698 A new page gets created for different ephemeral volumes because the relationship between them needs to be explained.
- Loading branch information
Showing
3 changed files
with
263 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,254 @@ | ||
--- | ||
reviewers: | ||
- jsafrane | ||
- saad-ali | ||
- msau42 | ||
- xing-yang | ||
- pohly | ||
title: Ephemeral Volumes | ||
content_type: concept | ||
weight: 20 | ||
--- | ||
|
||
<!-- overview --> | ||
|
||
This document describes the current state of _ephemeral volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested. | ||
|
||
<!-- body --> | ||
|
||
## Introduction | ||
|
||
Some application need additional storage but don't care whether that | ||
data is stored persistently across restarts. For example, caching | ||
services are often limited by memory size and can move infrequently | ||
used data into storage that is slower than memory with little impact | ||
on overall performance. | ||
|
||
Other applications expect some read-only input data to be present in | ||
files, like configuration data or secret keys. | ||
|
||
_Ephemeral volumes_ are designed for these use cases. Because volumes | ||
are created anew for each pod, pods can be stopped and restarted | ||
without being limited to where some persistent volume is available. | ||
|
||
Ephemeral volumes are specified _inline_ in the pod spec, which | ||
simplifies application deployment and management. | ||
|
||
Kubernetes supports several different kinds of ephemeral volumes for | ||
different purposes: | ||
- [emptyDir]((/docs/concepts/volumes/#emptydir): a directory on the root disk or | ||
a tmpfs | ||
- [configMap](/docs/concepts/volumes/#configmap), | ||
[downwardAPI](/docs/concepts/volumes/#downwardapi), | ||
[secret](/docs/concepts/storage/volumes/#secret): inject different | ||
kinds of Kubernetes data into a pod | ||
- [CSI ephemeral | ||
volumes](docs/concepts/storage/volumes/#csi-ephemeral-volumes): | ||
similar to the previous volume kinds, but provided by special [CSI | ||
drivers](https://github.com/container-storage-interface/spec/blob/master/spec.md) | ||
which specifically [support this feature](https://kubernetes-csi.github.io/docs/drivers.html) | ||
- _generic ephemeral volumes_ (described [below](#generic-ephemeral-volumes)): | ||
can be provided by all storage drivers that also support persistent volumes | ||
|
||
`emptyDir`, `configMap`, `downwardAPI`, `secret` are provided as | ||
[local ephemeral | ||
storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage). | ||
They are managed by kubelet on each node. | ||
|
||
CSI ephemeral volumes *must* be provided by third-party CSI storage | ||
drivers. Generic ephemeral volumes *can* be provided by third-party | ||
CSI storage drivers, but also by any other storage driver that | ||
supports dynamic provisioning. These drivers can offer functionality | ||
that Kubernetes itself does not support, for example storage with | ||
different performance characteristics than the root disk that is | ||
managed by kubelet, or injecting different data. | ||
|
||
### CSI ephemeral volumes | ||
|
||
{{< feature-state for_k8s_version="v1.16" state="beta" >}} | ||
|
||
This feature requires the CSIInlineVolume feature gate to be enabled. It | ||
is enabled by default starting with Kubernetes 1.16. | ||
|
||
CSI ephemeral volumes are only supported by a subset of CSI | ||
drivers. Please see [this | ||
list](https://kubernetes-csi.github.io/docs/drivers.html). | ||
|
||
Conceptually, CSI ephemeral volumes are similar to `configMap`, | ||
`downwardAPI` and `secret`: they are managed locally on each node and | ||
get created together with other local resources after a pod has been | ||
scheduled onto a node. Kubernetes has no concept of rescheduling pods | ||
anymore at this stage. Volume creation has to be unlikely to fail, | ||
otherwise pod startup gets stuck. In particular, [storage capacity | ||
aware pod scheduling](/docs/concepts/storage-capacity/) is *not* | ||
supported for these volumes. They are currently also not covered by | ||
the storage resource usage limits of a pod, because that is something | ||
that kubelet can only enforce for storage that it manages itself. | ||
|
||
|
||
Example: | ||
|
||
```yaml | ||
kind: Pod | ||
apiVersion: v1 | ||
metadata: | ||
name: my-csi-app | ||
spec: | ||
containers: | ||
- name: my-frontend | ||
image: busybox | ||
volumeMounts: | ||
- mountPath: "/data" | ||
name: my-csi-inline-vol | ||
command: [ "sleep", "1000000" ] | ||
volumes: | ||
- name: my-csi-inline-vol | ||
csi: | ||
driver: inline.storage.kubernetes.io | ||
volumeAttributes: | ||
foo: bar | ||
``` | ||
The `volumeAttributes` determine what volume is prepared by the | ||
driver. These attributes are specific to each driver and not | ||
standardized. See the documentation of each CSI driver for further | ||
instructions. | ||
|
||
Cluster administrators can control which CSI drivers can be used in a | ||
pod via the [Pod Security | ||
Policy](/docs/concepts/policy/pod-security-policy/) with the | ||
[`PodSecurityPolicySpec.allowedCSIDrivers` field](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicyspec-v1beta1-policy). | ||
|
||
### Generic ephemeral volumes | ||
|
||
{{< feature-state for_k8s_version="v1.19" state="alpha" >}} | ||
|
||
This feature requires the GenericEphemeralVolume feature gate to be | ||
enabled. Because this is an alpha feature, it is disabled by default. | ||
|
||
Generic ephemeral volumes are similar to `emptyDir` volumes, just more | ||
flexible: | ||
- Storage can be local or network-attached. | ||
- Volumes can have a fixed size that pods are not able to exceed. | ||
- Volumes may have some initial data, depending on the driver and | ||
parameters. | ||
- All of the normal volume operations | ||
([snapshotting](/docs/concepts/storage/volume-snapshots/), | ||
[cloning](/docs/concepts/storage/volume-pvc-datasource/), | ||
[resizing](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims), | ||
[storage capacity tracking](/docs/concepts/storage-capacity/), etc.) | ||
are supported. | ||
|
||
Example: | ||
|
||
```yaml | ||
kind: Pod | ||
apiVersion: v1 | ||
metadata: | ||
name: my-app | ||
spec: | ||
containers: | ||
- name: my-frontend | ||
image: busybox | ||
volumeMounts: | ||
- mountPath: "/scratch" | ||
name: scratch-volume | ||
command: [ "sleep", "1000000" ] | ||
volumes: | ||
- name: scratch-volume | ||
ephemeral: | ||
volumeClaimTemplate: | ||
metadata: | ||
labels: | ||
type: my-frontend-volume | ||
spec: | ||
accessModes: [ "ReadWriteOnce" ] | ||
storageClassName: "scratch-storage-class" | ||
resources: | ||
requests: | ||
storage: 1Gi | ||
``` | ||
|
||
### Lifecycle and PersistentVolumeClaim | ||
|
||
The key design idea is that the [parameters for a | ||
PersistentVolumeClaim](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#ephemeralvolumesource-v1alpha1-core) | ||
are allowed inside a volume source of the pod. Labels, annotations and | ||
the full PersistentVolumeClaimSpec are supported. When such a pod gets | ||
created, a new controller then creates an actual PersistentVolumeClaim | ||
object in the same namespace as the pod. | ||
|
||
That triggers volume binding and/or provisioning, either immediately if | ||
the storage class uses immediate volume binding or when the pod is | ||
tentatively scheduled onto a node (`WaitForFirstConsumer` volume | ||
binding mode). The latter is recommended for generic ephemeral volumes | ||
because then the pod scheduler is free to choose a suitable node for | ||
the pod. With immediate binding, it is forced to use a node that has | ||
access to the volume once it is available. | ||
|
||
These additional PVCs are owned by the pod. When the pod gets deleted, | ||
the Kubernetes garbage collector deletes the PVC, which then usually | ||
triggers deletion of the volume because the default reclaim policy of | ||
storage classes is to delete volumes. If for some reason an ephemeral | ||
volume is not meant to be deleted, a storage class with "retain" as | ||
reclaim policy can be used. | ||
|
||
Once these PVCs exist, they can be used like any other PVC. In | ||
particular, they can be referenced as data source in volume cloning or | ||
snapshotting. The PVC object also holds the current status of the | ||
volume. | ||
|
||
### PVC Naming | ||
|
||
Naming of the additional PVCs is currently deterministic: the name is | ||
a combination of pod name and volume name, with a hyphen (`-`) in the | ||
middle. In the example above, the PVC name will be | ||
`my-app-scratch-volume`. This deterministic naming makes it easier to | ||
interact with the PVC because one does not have to search for it once | ||
the pod name and volume name are known. | ||
|
||
However, it also introduces a potential conflict between different | ||
pods (a pod "pod-a" with volume "scratch" and another pod with name | ||
"pod" and volume "a-scratch" both end up with the same PVC name | ||
"pod-a-scratch") and between pods and manually created PVCs. | ||
|
||
Such conflicts are detected: a PVC is only used for an ephemeral | ||
volume if it was created for the pod. This check is based on the | ||
ownership relationship. An existing PVC is not overwritten or | ||
modified. But this does not resolve the conflict because without the | ||
right PVC, the pod cannot start. | ||
|
||
Therefore care must be taken when naming pods and volumes inside the | ||
same namespace such that these conflicts cannot occur. | ||
|
||
### Security | ||
|
||
Enabling the GenericEphemeralVolume feature allows users to create | ||
PVCs indirectly if they can create pods, even if they do not have | ||
permission to create them directly. Cluster administrators must be | ||
aware of this. If this does not fit their security model, they have | ||
two choices: | ||
- Explicitly disable the feature through the feature gate, to avoid | ||
being surprised when some future Kubernetes version enables it | ||
by default. | ||
- Use a [Pod Security | ||
Policy](/docs/concepts/policy/pod-security-policy/) where the | ||
`volumes` list does not contain the `ephemeral` volume type. | ||
|
||
The normal namespace quota for PVCs in a namespace still applies, so | ||
even if users are allowed to use this new mechanism, they cannot use | ||
it to circumvent other policies. | ||
|
||
## {{% heading "whatsnext" %}} | ||
|
||
### CSI ephemeral volumes | ||
|
||
- For more information on the design, see the [Ephemeral Inline CSI | ||
volumes KEP](https://github.com/kubernetes/enhancements/blob/ad6021b3d61a49040a3f835e12c8bb5424db2bbb/keps/sig-storage/20190122-csi-inline-volumes.md). | ||
- For more information on further development of this feature, see the [enhancement tracking issue #596](https://github.com/kubernetes/enhancements/issues/596). | ||
|
||
### Generic ephemeral volumes | ||
|
||
- For more information on the design, see the | ||
[Generic ephemeral inline volumes KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md). | ||
- For more information on further development of this feature, see the [enhancement tracking issue #1698](https://github.com/kubernetes/enhancements/issues/1698). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters