Skip to content

Commit

Permalink
storage: GenericEphemeralVolume
Browse files Browse the repository at this point in the history
This is the initial documentation for one new feature:
- kubernetes/enhancements#1698

A new page gets created for different ephemeral volumes because the
relationship between them needs to be explained.

Co-authored-by: Tim Bannister <tim@scalefactory.com>
  • Loading branch information
pohly and sftim committed Jul 13, 2020
1 parent 38a5d01 commit 934c42f
Show file tree
Hide file tree
Showing 3 changed files with 276 additions and 32 deletions.
263 changes: 263 additions & 0 deletions content/en/docs/concepts/storage/ephemeral-volumes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
---
reviewers:
- jsafrane
- saad-ali
- msau42
- xing-yang
- pohly
title: Ephemeral Volumes
content_type: concept
weight: 50
---

<!-- overview -->

This document describes _ephemeral volumes_ in Kubernetes. Familiarity
with [volumes](/docs/concepts/storage/volumes/) is suggested, in
particular `PersistentVolumeClaim` and `PersistentVolume`.

<!-- body -->

Some application need additional storage but don't care whether that
data is stored persistently across restarts. For example, caching
services are often limited by memory size and can move infrequently
used data into storage that is slower than memory with little impact
on overall performance.

Other applications expect some read-only input data to be present in
files, like configuration data or secret keys.

_Ephemeral volumes_ are designed for these use cases. Because volumes
are created anew for each pod, pods can be stopped and restarted
without being limited to where some persistent volume is available.

Ephemeral volumes are specified _inline_ in the pod spec, which
simplifies application deployment and management.

### Types of ephemeral volumes

Kubernetes supports several different kinds of ephemeral volumes for
different purposes:
- [emptyDir]((/docs/concepts/volumes/#emptydir): empty at Pod startup,
with storage provided locally from the root disk or RAM
- [configMap](/docs/concepts/volumes/#configmap),
[downwardAPI](/docs/concepts/volumes/#downwardapi),
[secret](/docs/concepts/storage/volumes/#secret): inject different
kinds of Kubernetes data into a pod
- [CSI ephemeral
volumes](docs/concepts/storage/volumes/#csi-ephemeral-volumes):
similar to the previous volume kinds, but provided by special [CSI
drivers](https://github.com/container-storage-interface/spec/blob/master/spec.md)
which specifically [support this feature](https://kubernetes-csi.github.io/docs/drivers.html)
- [generic ephemeral volumes](#generic-ephemeral-volumes), which
can be provided by all storage drivers that also support persistent volumes

`emptyDir`, `configMap`, `downwardAPI`, `secret` are provided as
[local ephemeral
storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).
They are managed by kubelet on each node.

CSI ephemeral volumes *must* be provided by third-party CSI storage
drivers. Generic ephemeral volumes *can* be provided by third-party
CSI storage drivers, but also by any other storage driver that
supports dynamic provisioning. These drivers can offer functionality
that Kubernetes itself does not support, for example storage with
different performance characteristics than the root disk that is
managed by kubelet, or injecting different data.

### CSI ephemeral volumes

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature requires the `CSIInlineVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. It
is enabled by default starting with Kubernetes 1.16.

{{< note >}}
CSI ephemeral volumes are only supported by a subset of CSI drivers.
The Kubernetes CSI [Drivers list](https://kubernetes-csi.github.io/docs/drivers.html)
shows which drivers support ephemeral volumes.
{{< /note >}}

Conceptually, CSI ephemeral volumes are similar to `configMap`,
`downwardAPI` and `secret` volume types: the storage is managed locally on each
node and is created together with other local resources after a Pod has been
scheduled onto a node. Kubernetes has no concept of rescheduling pods
anymore at this stage. Volume creation has to be unlikely to fail,
otherwise pod startup gets stuck. In particular, [storage capacity
aware pod scheduling](/docs/concepts/storage-capacity/) is *not*
supported for these volumes. They are currently also not covered by
the storage resource usage limits of a pod, because that is something
that kubelet can only enforce for storage that it manages itself.


Here's an example manifest for a Pod that uses CSI ephemeral storage:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```
The `volumeAttributes` determine what volume is prepared by the
driver. These attributes are specific to each driver and not
standardized. See the documentation of each CSI driver for further
instructions.

As a cluster administrator, you can use a [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/) to control which CSI drivers can be used in a Pod, specified with the
[`allowedCSIDrivers` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicyspec-v1beta1-policy).

### Generic ephemeral volumes

{{< feature-state for_k8s_version="v1.19" state="alpha" >}}

This feature requires the `GenericEphemeralVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be
enabled. Because this is an alpha feature, it is disabled by default.

Generic ephemeral volumes are similar to `emptyDir` volumes, just more
flexible:
- Storage can be local or network-attached.
- Volumes can have a fixed size that pods are not able to exceed.
- Volumes may have some initial data, depending on the driver and
parameters.
- Typical operations on volumes are supported, including
([snapshotting](/docs/concepts/storage/volume-snapshots/),
[cloning](/docs/concepts/storage/volume-pvc-datasource/),
[resizing](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims),
and [storage capacity tracking](/docs/concepts/storage-capacity/).

Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/scratch"
name: scratch-volume
command: [ "sleep", "1000000" ]
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi
```

### Lifecycle and PersistentVolumeClaim

The key design idea is that the [parameters for a
volume claim](/docs/reference/generated/kubernetes-api/#ephemeralvolumesource-v1alpha1-core)
are allowed inside a volume source of the pod. Labels, annotations and
the whole set of fields for a PersistentVolumeClaim are supported. When such a Pod gets
created, the ephemeral volume controller then creates an actual PersistentVolumeClaim
object in the same namespace as the pod.

That triggers volume binding and/or provisioning, either immediately if
the {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} uses immediate volume binding or when the pod is
tentatively scheduled onto a node (`WaitForFirstConsumer` volume
binding mode). The latter is recommended for generic ephemeral volumes
because then the scheduler is free to choose a suitable node for
the Pod. With immediate binding, the scheduler is forced to select a node that has
access to the volume once it is available.

In terms of [resource ownership](/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents),
a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s)
that provide that ephemeral storage. When the Pod is deleted,
the Kubernetes garbage collector deletes the PVC, which then usually
triggers deletion of the volume because the default reclaim policy of
storage classes is to delete volumes. You can create quasi-ephemeral local storage
using a StorageClass with a reclaim policy of `retain`: the storage outlives the Pod,
and in this case you need to ensure that volume clean up happens separately.

Once these PVCs exist, they can be used like any other PVC. In
particular, they can be referenced as data source in volume cloning or
snapshotting. The PVC object also holds the current status of the
volume.

### PersistentVolumeClaim naming

Naming of the additional PVCs is deterministic: the name is
a combination of pod name and volume name, with a hyphen (`-`) in the
middle. In the example above, the PVC name will be
`my-app-scratch-volume`. This deterministic naming makes it easier to
interact with the PVC because one does not have to search for it once
the pod name and volume name are known.

The deterministic naming also introduces a potential conflict between different
pods (a pod "pod-a" with volume "scratch" and another pod with name
"pod" and volume "a-scratch" both end up with the same PVC name
"pod-a-scratch") and between pods and manually created PVCs.

Such conflicts are detected: a PVC is only used for an ephemeral
volume if it was created for the pod. This check is based on the
ownership relationship. An existing PVC is not overwritten or
modified. But this does not resolve the conflict because without the
right PVC, the pod cannot start.

{{< caution >}}
Take care when naming Pods and volumes inside the
same namespace, so that these conflicts can't occur.
{{< /caution >}}

### Security

Enabling the GenericEphemeralVolume feature allows users to create
PVCs indirectly if they can create pods, even if they do not have
permission to create them directly. Cluster administrators must be
aware of this. If this does not fit their security model, they have
two choices:
- Explicitly disable the feature through the feature gate, to avoid
being surprised when some future Kubernetes version enables it
by default.
- Use a [Pod Security
Policy](/docs/concepts/policy/pod-security-policy/) where the
`volumes` list does not contain the `ephemeral` volume type.

The normal namespace quota for PVCs in a namespace still applies, so
even if users are allowed to use this new mechanism, they cannot use
it to circumvent other policies.

## {{% heading "whatsnext" %}}

### Ephemeral volumes managed by kubelet

See [local ephemeral storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).

### CSI ephemeral volumes

- For more information on the design, see the [Ephemeral Inline CSI
volumes KEP](https://github.com/kubernetes/enhancements/blob/ad6021b3d61a49040a3f835e12c8bb5424db2bbb/keps/sig-storage/20190122-csi-inline-volumes.md).
- For more information on further development of this feature, see the [enhancement tracking issue #596](https://github.com/kubernetes/enhancements/issues/596).

### Generic ephemeral volumes

- For more information on the design, see the
[Generic ephemeral inline volumes KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md).
- For more information on further development of this feature, see the [enhancement tracking issue #1698](https://github.com/kubernetes/enhancements/issues/1698).
43 changes: 11 additions & 32 deletions content/en/docs/concepts/storage/volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1291,8 +1291,11 @@ Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users
may use the `csi` volume type to attach, mount, etc. the volumes exposed by the
CSI driver.

The `csi` volume type does not support direct reference from Pod and may only be
referenced in a Pod via a `PersistentVolumeClaim` object.
A `csi` volume can be used in a pod in three different ways:
- through a reference to a [`persistentVolumeClaim`](#persistentvolumeclaim)
- with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volume)
- with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume) if the driver
supports that

The following fields are available to storage administrators to configure a CSI
persistent volume:
Expand Down Expand Up @@ -1355,37 +1358,13 @@ as usual, without any CSI specific changes.

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature allows CSI volumes to be directly embedded in the Pod specification instead of a PersistentVolume. Volumes specified in this way are ephemeral and do not persist across Pod restarts.
You can directly configure CSI volumes within the Pod
specification. Volumes specified in this way are ephemeral and do not
persist across Pod restarts. See [Ephemeral
Volumes](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume)
for more information.

Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```

This feature requires CSIInlineVolume feature gate to be enabled. It
is enabled by default starting with Kubernetes 1.16.

CSI ephemeral volumes are only supported by a subset of CSI drivers. Please see the list of CSI drivers [here](https://kubernetes-csi.github.io/docs/drivers.html).

# Developer resources
#### Developer resources
For more information on how to develop a CSI driver, refer to the [kubernetes-csi
documentation](https://kubernetes-csi.github.io/docs/)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ different Kubernetes components.
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | |
| `EvenPodsSpread` | `false` | Alpha | 1.16 | 1.17 |
| `EvenPodsSpread` | `true` | Beta | 1.18 | |
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | |
| `HPAScaleToZero` | `false` | Alpha | 1.16 | |
| `HugePageStorageMediumSize` | `false` | Alpha | 1.18 | 1.18 |
| `HugePageStorageMediumSize` | `true` | Beta | 1.19 | |
Expand Down Expand Up @@ -431,6 +432,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
use EndpointSlices as the primary data source instead of Endpoints, enabling
scalability and performance improvements. See [Enabling Endpoint Slices](/docs/tasks/administer-cluster/enabling-endpointslices/).
- `GCERegionalPersistentDisk`: Enable the regional PD feature on GCE.
- `GenericEphemeralVolume`: Enables ephemeral, inline volumes that support all features of normal volumes (can be provided by third-party storage vendors, storage capacity tracking, restore from snapshot, etc.). See [Ephemeral Volumes](/docs/concepts/storage/ephemeral-volumes/).
- `HugePages`: Enable the allocation and consumption of pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HugePageStorageMediumSize`: Enable support for multiple sizes pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HyperVContainer`: Enable [Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container) for Windows containers.
Expand Down

0 comments on commit 934c42f

Please sign in to comment.