Skip to content

Commit

Permalink
storage: GenericEphemeralVolume
Browse files Browse the repository at this point in the history
This is the initial documentation for one new feature:
- kubernetes/enhancements#1698

A new page gets created for different ephemeral volumes because the
relationship between them needs to be explained.
  • Loading branch information
pohly committed Jul 10, 2020
1 parent 38a5d01 commit 1f1fa07
Show file tree
Hide file tree
Showing 3 changed files with 263 additions and 32 deletions.
254 changes: 254 additions & 0 deletions content/en/docs/concepts/storage/ephemeral-volumes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
---
reviewers:
- jsafrane
- saad-ali
- msau42
- xing-yang
- pohly
title: Ephemeral Volumes
content_type: concept
weight: 20
---

<!-- overview -->

This document describes the current state of _ephemeral volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.

<!-- body -->

## Introduction

Some application need additional storage but don't care whether that
data is stored persistently across restarts. For example, caching
services are often limited by memory size and can move infrequently
used data into storage that is slower than memory with little impact
on overall performance.

Other applications expect some read-only input data to be present in
files, like configuration data or secret keys.

_Ephemeral volumes_ are designed for these use cases. Because volumes
are created anew for each pod, pods can be stopped and restarted
without being limited to where some persistent volume is available.

Ephemeral volumes are specified _inline_ in the pod spec, which
simplifies application deployment and management.

Kubernetes supports several different kinds of ephemeral volumes for
different purposes:
- [emptyDir]((/docs/concepts/volumes/#emptydir): a directory on the root disk or
a tmpfs
- [configMap](/docs/concepts/volumes/#configmap),
[downwardAPI](/docs/concepts/volumes/#downwardapi),
[secret](/docs/concepts/storage/volumes/#secret): inject different
kinds of Kubernetes data into a pod
- [CSI ephemeral
volumes](docs/concepts/storage/volumes/#csi-ephemeral-volumes):
similar to the previous volume kinds, but provided by special [CSI
drivers](https://github.com/container-storage-interface/spec/blob/master/spec.md)
which specifically [support this feature](https://kubernetes-csi.github.io/docs/drivers.html)
- _generic ephemeral volumes_ (described [below](#generic-ephemeral-volumes)):
can be provided by all storage drivers that also support persistent volumes

`emptyDir`, `configMap`, `downwardAPI`, `secret` are provided as
[local ephemeral
storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).
They are managed by kubelet on each node.

CSI ephemeral volumes *must* be provided by third-party CSI storage
drivers. Generic ephemeral volumes *can* be provided by third-party
CSI storage drivers, but also by any other storage driver that
supports dynamic provisioning. These drivers can offer functionality
that Kubernetes itself does not support, for example storage with
different performance characteristics than the root disk that is
managed by kubelet, or injecting different data.

### CSI ephemeral volumes

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature requires the CSIInlineVolume feature gate to be enabled. It
is enabled by default starting with Kubernetes 1.16.

CSI ephemeral volumes are only supported by a subset of CSI
drivers. Please see [this
list](https://kubernetes-csi.github.io/docs/drivers.html).

Conceptually, CSI ephemeral volumes are similar to `configMap`,
`downwardAPI` and `secret`: they are managed locally on each node and
get created together with other local resources after a pod has been
scheduled onto a node. Kubernetes has no concept of rescheduling pods
anymore at this stage. Volume creation has to be unlikely to fail,
otherwise pod startup gets stuck. In particular, [storage capacity
aware pod scheduling](/docs/concepts/storage-capacity/) is *not*
supported for these volumes. They are currently also not covered by
the storage resource usage limits of a pod, because that is something
that kubelet can only enforce for storage that it manages itself.


Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```
The `volumeAttributes` determine what volume is prepared by the
driver. These attributes are specific to each driver and not
standardized. See the documentation of each CSI driver for further
instructions.

Cluster administrators can control which CSI drivers can be used in a
pod via the [Pod Security
Policy](/docs/concepts/policy/pod-security-policy/) with the
[`PodSecurityPolicySpec.allowedCSIDrivers` field](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podsecuritypolicyspec-v1beta1-policy).

### Generic ephemeral volumes

{{< feature-state for_k8s_version="v1.19" state="alpha" >}}

This feature requires the GenericEphemeralVolume feature gate to be
enabled. Because this is an alpha feature, it is disabled by default.

Generic ephemeral volumes are similar to `emptyDir` volumes, just more
flexible:
- Storage can be local or network-attached.
- Volumes can have a fixed size that pods are not able to exceed.
- Volumes may have some initial data, depending on the driver and
parameters.
- All of the normal volume operations
([snapshotting](/docs/concepts/storage/volume-snapshots/),
[cloning](/docs/concepts/storage/volume-pvc-datasource/),
[resizing](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims),
[storage capacity tracking](/docs/concepts/storage-capacity/), etc.)
are supported.

Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/scratch"
name: scratch-volume
command: [ "sleep", "1000000" ]
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi
```

### Lifecycle and PersistentVolumeClaim

The key design idea is that the [parameters for a
PersistentVolumeClaim](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#ephemeralvolumesource-v1alpha1-core)
are allowed inside a volume source of the pod. Labels, annotations and
the full PersistentVolumeClaimSpec are supported. When such a pod gets
created, a new controller then creates an actual PersistentVolumeClaim
object in the same namespace as the pod.

That triggers volume binding and/or provisioning, either immediately if
the storage class uses immediate volume binding or when the pod is
tentatively scheduled onto a node (`WaitForFirstConsumer` volume
binding mode). The latter is recommended for generic ephemeral volumes
because then the pod scheduler is free to choose a suitable node for
the pod. With immediate binding, it is forced to use a node that has
access to the volume once it is available.

These additional PVCs are owned by the pod. When the pod gets deleted,
the Kubernetes garbage collector deletes the PVC, which then usually
triggers deletion of the volume because the default reclaim policy of
storage classes is to delete volumes. If for some reason an ephemeral
volume is not meant to be deleted, a storage class with "retain" as
reclaim policy can be used.

Once these PVCs exist, they can be used like any other PVC. In
particular, they can be referenced as data source in volume cloning or
snapshotting. The PVC object also holds the current status of the
volume.

### PVC Naming

Naming of the additional PVCs is currently deterministic: the name is
a combination of pod name and volume name, with a hyphen (`-`) in the
middle. In the example above, the PVC name will be
`my-app-scratch-volume`. This deterministic naming makes it easier to
interact with the PVC because one does not have to search for it once
the pod name and volume name are known.

However, it also introduces a potential conflict between different
pods (a pod "pod-a" with volume "scratch" and another pod with name
"pod" and volume "a-scratch" both end up with the same PVC name
"pod-a-scratch") and between pods and manually created PVCs.

Such conflicts are detected: a PVC is only used for an ephemeral
volume if it was created for the pod. This check is based on the
ownership relationship. An existing PVC is not overwritten or
modified. But this does not resolve the conflict because without the
right PVC, the pod cannot start.

Therefore care must be taken when naming pods and volumes inside the
same namespace such that these conflicts cannot occur.

### Security

Enabling the GenericEphemeralVolume feature allows users to create
PVCs indirectly if they can create pods, even if they do not have
permission to create them directly. Cluster administrators must be
aware of this. If this does not fit their security model, they have
two choices:
- Explicitly disable the feature through the feature gate, to avoid
being surprised when some future Kubernetes version enables it
by default.
- Use a [Pod Security
Policy](/docs/concepts/policy/pod-security-policy/) where the
`volumes` list does not contain the `ephemeral` volume type.

The normal namespace quota for PVCs in a namespace still applies, so
even if users are allowed to use this new mechanism, they cannot use
it to circumvent other policies.

## {{% heading "whatsnext" %}}

### CSI ephemeral volumes

- For more information on the design, see the [Ephemeral Inline CSI
volumes KEP](https://github.com/kubernetes/enhancements/blob/ad6021b3d61a49040a3f835e12c8bb5424db2bbb/keps/sig-storage/20190122-csi-inline-volumes.md).
- For more information on further development of this feature, see the [enhancement tracking issue #596](https://github.com/kubernetes/enhancements/issues/596).

### Generic ephemeral volumes

- For more information on the design, see the
[Generic ephemeral inline volumes KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md).
- For more information on further development of this feature, see the [enhancement tracking issue #1698](https://github.com/kubernetes/enhancements/issues/1698).
39 changes: 7 additions & 32 deletions content/en/docs/concepts/storage/volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1291,8 +1291,11 @@ Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users
may use the `csi` volume type to attach, mount, etc. the volumes exposed by the
CSI driver.

The `csi` volume type does not support direct reference from Pod and may only be
referenced in a Pod via a `PersistentVolumeClaim` object.
A `csi` volume can be used in a pod in three different ways:
- through a reference to a [`persistentVolumeClaim`](#persistentvolumeclaim)
- with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volume)
- with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume) if the driver
supports that

The following fields are available to storage administrators to configure a CSI
persistent volume:
Expand Down Expand Up @@ -1355,37 +1358,9 @@ as usual, without any CSI specific changes.

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature allows CSI volumes to be directly embedded in the Pod specification instead of a PersistentVolume. Volumes specified in this way are ephemeral and do not persist across Pod restarts.
This feature allows CSI volumes to be directly embedded in the Pod specification instead of a PersistentVolume. Volumes specified in this way are ephemeral and do not persist across Pod restarts. See [the ephemeral volume page](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume) for more information.

Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```

This feature requires CSIInlineVolume feature gate to be enabled. It
is enabled by default starting with Kubernetes 1.16.

CSI ephemeral volumes are only supported by a subset of CSI drivers. Please see the list of CSI drivers [here](https://kubernetes-csi.github.io/docs/drivers.html).

# Developer resources
#### Developer resources
For more information on how to develop a CSI driver, refer to the [kubernetes-csi
documentation](https://kubernetes-csi.github.io/docs/)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ different Kubernetes components.
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | |
| `EvenPodsSpread` | `false` | Alpha | 1.16 | 1.17 |
| `EvenPodsSpread` | `true` | Beta | 1.18 | |
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | |
| `HPAScaleToZero` | `false` | Alpha | 1.16 | |
| `HugePageStorageMediumSize` | `false` | Alpha | 1.18 | 1.18 |
| `HugePageStorageMediumSize` | `true` | Beta | 1.19 | |
Expand Down Expand Up @@ -431,6 +432,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
use EndpointSlices as the primary data source instead of Endpoints, enabling
scalability and performance improvements. See [Enabling Endpoint Slices](/docs/tasks/administer-cluster/enabling-endpointslices/).
- `GCERegionalPersistentDisk`: Enable the regional PD feature on GCE.
- `GenericEphemeralVolume`: Enables ephemeral, inline volumes that support all features of normal volumes (can be provided by third-party storage vendors, storage capacity tracking, restore from snapshot, etc.). See [Ephemeral Volumes](/docs/concepts/storage/ephemeral-volumes/).
- `HugePages`: Enable the allocation and consumption of pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HugePageStorageMediumSize`: Enable support for multiple sizes pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HyperVContainer`: Enable [Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container) for Windows containers.
Expand Down

0 comments on commit 1f1fa07

Please sign in to comment.