Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: GenericEphemeralVolume #22438

Merged
merged 3 commits into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
273 changes: 273 additions & 0 deletions content/en/docs/concepts/storage/ephemeral-volumes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
---
reviewers:
- jsafrane
- saad-ali
- msau42
- xing-yang
- pohly
title: Ephemeral Volumes
content_type: concept
weight: 50
---

<!-- overview -->

This document describes _ephemeral volumes_ in Kubernetes. Familiarity
with [volumes](/docs/concepts/storage/volumes/) is suggested, in
particular `PersistentVolumeClaim` and `PersistentVolume`.
pohly marked this conversation as resolved.
Show resolved Hide resolved

<!-- body -->

Some application need additional storage but don't care whether that
data is stored persistently across restarts. For example, caching
services are often limited by memory size and can move infrequently
used data into storage that is slower than memory with little impact
on overall performance.

Other applications expect some read-only input data to be present in
files, like configuration data or secret keys.

_Ephemeral volumes_ are designed for these use cases. Because volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to mention somehwere in the intro that ephemeral volumes follow the pod's lifetime and get created and deleted along with the pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced "created anew for each pod" with "follow the pod's lifetime and get created and deleted along with the pod".

Or rather, Pod (capital case)... I also fixed that elsewhere.

follow the Pod's lifetime and get created and deleted along with the
Pod, Pods can be stopped and restarted without being limited to where
some persistent volume is available.

Ephemeral volumes are specified _inline_ in the Pod spec, which
simplifies application deployment and management.

pohly marked this conversation as resolved.
Show resolved Hide resolved
### Types of ephemeral volumes

Kubernetes supports several different kinds of ephemeral volumes for
different purposes:
- [emptyDir]((/docs/concepts/volumes/#emptydir): empty at Pod startup,
pohly marked this conversation as resolved.
Show resolved Hide resolved
with storage coming locally from the kubelet base directory (usually
the root disk) or RAM
- [configMap](/docs/concepts/volumes/#configmap),
[downwardAPI](/docs/concepts/volumes/#downwardapi),
[secret](/docs/concepts/storage/volumes/#secret): inject different
kinds of Kubernetes data into a Pod
- [CSI ephemeral
volumes](docs/concepts/storage/volumes/#csi-ephemeral-volumes):
similar to the previous volume kinds, but provided by special [CSI
drivers](https://github.com/container-storage-interface/spec/blob/master/spec.md)
which specifically [support this feature](https://kubernetes-csi.github.io/docs/drivers.html)
- [generic ephemeral volumes](#generic-ephemeral-volumes), which
can be provided by all storage drivers that also support persistent volumes

`emptyDir`, `configMap`, `downwardAPI`, `secret` are provided as
[local ephemeral
storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).
They are managed by kubelet on each node.

CSI ephemeral volumes *must* be provided by third-party CSI storage
drivers.

Generic ephemeral volumes *can* be provided by third-party CSI storage
drivers, but also by any other storage driver that supports dynamic
provisioning. Some CSI drivers are written specifically for CSI
ephemeral volumes and do not support dynamic provisioning: those then
cannot be used for generic ephemeral volumes.

The advantage of using third-party drivers is that they can offer
functionality that Kubernetes itself does not support, for example
storage with different performance characteristics than the disk that
is managed by kubelet, or injecting different data.

### CSI ephemeral volumes

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature requires the `CSIInlineVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. It
is enabled by default starting with Kubernetes 1.16.

{{< note >}}
CSI ephemeral volumes are only supported by a subset of CSI drivers.
The Kubernetes CSI [Drivers list](https://kubernetes-csi.github.io/docs/drivers.html)
shows which drivers support ephemeral volumes.
{{< /note >}}

Conceptually, CSI ephemeral volumes are similar to `configMap`,
`downwardAPI` and `secret` volume types: the storage is managed locally on each
node and is created together with other local resources after a Pod has been
scheduled onto a node. Kubernetes has no concept of rescheduling Pods
anymore at this stage. Volume creation has to be unlikely to fail,
otherwise Pod startup gets stuck. In particular, [storage capacity
aware Pod scheduling](/docs/concepts/storage-capacity/) is *not*
supported for these volumes. They are currently also not covered by
the storage resource usage limits of a Pod, because that is something
that kubelet can only enforce for storage that it manages itself.


Here's an example manifest for a Pod that uses CSI ephemeral storage:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```

The `volumeAttributes` determine what volume is prepared by the
driver. These attributes are specific to each driver and not
standardized. See the documentation of each CSI driver for further
instructions.

As a cluster administrator, you can use a [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/) to control which CSI drivers can be used in a Pod, specified with the
[`allowedCSIDrivers` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicyspec-v1beta1-policy).

### Generic ephemeral volumes

{{< feature-state for_k8s_version="v1.19" state="alpha" >}}

This feature requires the `GenericEphemeralVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be
enabled. Because this is an alpha feature, it is disabled by default.

Generic ephemeral volumes are similar to `emptyDir` volumes, just more
flexible:
- Storage can be local or network-attached.
- Volumes can have a fixed size that Pods are not able to exceed.
- Volumes may have some initial data, depending on the driver and
parameters.
- Typical operations on volumes are supported assuming that the driver
supports them, including
([snapshotting](/docs/concepts/storage/volume-snapshots/),
[cloning](/docs/concepts/storage/volume-pvc-datasource/),
[resizing](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims),
and [storage capacity tracking](/docs/concepts/storage-capacity/).

Example:

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/scratch"
name: scratch-volume
command: [ "sleep", "1000000" ]
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi
```

### Lifecycle and PersistentVolumeClaim

The key design idea is that the [parameters for a
volume claim](/docs/reference/generated/kubernetes-api/#ephemeralvolumesource-v1alpha1-core)
are allowed inside a volume source of the Pod. Labels, annotations and
the whole set of fields for a PersistentVolumeClaim are supported. When such a Pod gets
created, the ephemeral volume controller then creates an actual PersistentVolumeClaim
object in the same namespace as the Pod and ensures that the PersistentVolumeClaim
gets deleted when the Pod gets deleted.

That triggers volume binding and/or provisioning, either immediately if
the {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} uses immediate volume binding or when the Pod is
tentatively scheduled onto a node (`WaitForFirstConsumer` volume
binding mode). The latter is recommended for generic ephemeral volumes
because then the scheduler is free to choose a suitable node for
the Pod. With immediate binding, the scheduler is forced to select a node that has
access to the volume once it is available.

In terms of [resource ownership](/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents),
a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s)
that provide that ephemeral storage. When the Pod is deleted,
the Kubernetes garbage collector deletes the PVC, which then usually
triggers deletion of the volume because the default reclaim policy of
storage classes is to delete volumes. You can create quasi-ephemeral local storage
using a StorageClass with a reclaim policy of `retain`: the storage outlives the Pod,
and in this case you need to ensure that volume clean up happens separately.

While these PVCs exist, they can be used like any other PVC. In
particular, they can be referenced as data source in volume cloning or
snapshotting. The PVC object also holds the current status of the
volume.

### PersistentVolumeClaim naming

Naming of the automatically created PVCs is deterministic: the name is
a combination of Pod name and volume name, with a hyphen (`-`) in the
middle. In the example above, the PVC name will be
`my-app-scratch-volume`. This deterministic naming makes it easier to
interact with the PVC because one does not have to search for it once
the Pod name and volume name are known.

The deterministic naming also introduces a potential conflict between different
Pods (a Pod "pod-a" with volume "scratch" and another Pod with name
"pod" and volume "a-scratch" both end up with the same PVC name
"pod-a-scratch") and between Pods and manually created PVCs.

Such conflicts are detected: a PVC is only used for an ephemeral
volume if it was created for the Pod. This check is based on the
ownership relationship. An existing PVC is not overwritten or
modified. But this does not resolve the conflict because without the
right PVC, the Pod cannot start.

{{< caution >}}
Take care when naming Pods and volumes inside the
same namespace, so that these conflicts can't occur.
{{< /caution >}}

### Security

Enabling the GenericEphemeralVolume feature allows users to create
PVCs indirectly if they can create Pods, even if they do not have
permission to create PVCs directly. Cluster administrators must be
aware of this. If this does not fit their security model, they have
two choices:
- Explicitly disable the feature through the feature gate, to avoid
being surprised when some future Kubernetes version enables it
by default.
- Use a [Pod Security
Policy](/docs/concepts/policy/pod-security-policy/) where the
`volumes` list does not contain the `ephemeral` volume type.

The normal namespace quota for PVCs in a namespace still applies, so
even if users are allowed to use this new mechanism, they cannot use
it to circumvent other policies.

## {{% heading "whatsnext" %}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

### Ephemeral volumes managed by kubelet

See [local ephemeral storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).

### Ephemeral volumes managed by kubelet

See [local ephemeral storage](/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage).

### CSI ephemeral volumes

- For more information on the design, see the [Ephemeral Inline CSI
volumes KEP](https://github.com/kubernetes/enhancements/blob/ad6021b3d61a49040a3f835e12c8bb5424db2bbb/keps/sig-storage/20190122-csi-inline-volumes.md).
- For more information on further development of this feature, see the [enhancement tracking issue #596](https://github.com/kubernetes/enhancements/issues/596).

### Generic ephemeral volumes

- For more information on the design, see the
[Generic ephemeral inline volumes KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md).
- For more information on further development of this feature, see the [enhancement tracking issue #1698](https://github.com/kubernetes/enhancements/issues/1698).
Comment on lines +259 to +273
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a little tidier (for example, we usually and intentionally don't link to KEPs other than for historical context), but I'm happy to log a separate issue to track tidying that up after this merges. It's a minor detail for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for alpha and beta features, easy access to both the KEP and the enhancement issue are important: the KEP explains "why" the feature is what it is and in particular, what other alternatives were considered. The enhancement issue answers the questions "when will this graduate" and "what is still changing with this feature".

I can remove it in a follow-up PR if this explanation is not convincing enough. But for now, let's merge.

42 changes: 11 additions & 31 deletions content/en/docs/concepts/storage/volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1291,8 +1291,11 @@ Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users
may use the `csi` volume type to attach, mount, etc. the volumes exposed by the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to the new ephemeral-volumes page somewhere at the top of this page? (and link to the persistent volumes page in the same area)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "background" section may need a more thorough rewrite. For example, it still says:

A Kubernetes volume, on the other hand, has an explicit lifetime - the same as
the Pod that encloses it.

That became obsolete when persistent volumes were introduced, which was a while ago...

Let me punt on that for now for the sake of getting this PR merged before the docs deadline. I've created #22505 to track it.

CSI driver.

The `csi` volume type does not support direct reference from Pod and may only be
referenced in a Pod via a `PersistentVolumeClaim` object.
A `csi` volume can be used in a pod in three different ways:
- through a reference to a [`persistentVolumeClaim`](#persistentvolumeclaim)
- with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volume) (alpha feature)
- with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume) if the driver
supports that (beta feature)

The following fields are available to storage administrators to configure a CSI
persistent volume:
Expand Down Expand Up @@ -1355,37 +1358,14 @@ as usual, without any CSI specific changes.

{{< feature-state for_k8s_version="v1.16" state="beta" >}}

This feature allows CSI volumes to be directly embedded in the Pod specification instead of a PersistentVolume. Volumes specified in this way are ephemeral and do not persist across Pod restarts.
You can directly configure CSI volumes within the Pod
specification. Volumes specified in this way are ephemeral and do not
persist across Pod restarts. See [Ephemeral
Volumes](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume)
for more information.

Example:
#### {{% heading "whatsnext" %}}

```yaml
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
```

This feature requires CSIInlineVolume feature gate to be enabled. It
is enabled by default starting with Kubernetes 1.16.

CSI ephemeral volumes are only supported by a subset of CSI drivers. Please see the list of CSI drivers [here](https://kubernetes-csi.github.io/docs/drivers.html).

# Developer resources
For more information on how to develop a CSI driver, refer to the [kubernetes-csi
documentation](https://kubernetes-csi.github.io/docs/)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ different Kubernetes components.
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | |
| `EvenPodsSpread` | `false` | Alpha | 1.16 | 1.17 |
| `EvenPodsSpread` | `true` | Beta | 1.18 | |
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | |
| `HPAScaleToZero` | `false` | Alpha | 1.16 | |
| `HugePageStorageMediumSize` | `false` | Alpha | 1.18 | 1.18 |
| `HugePageStorageMediumSize` | `true` | Beta | 1.19 | |
Expand Down Expand Up @@ -431,6 +432,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
use EndpointSlices as the primary data source instead of Endpoints, enabling
scalability and performance improvements. See [Enabling Endpoint Slices](/docs/tasks/administer-cluster/enabling-endpointslices/).
- `GCERegionalPersistentDisk`: Enable the regional PD feature on GCE.
- `GenericEphemeralVolume`: Enables ephemeral, inline volumes that support all features of normal volumes (can be provided by third-party storage vendors, storage capacity tracking, restore from snapshot, etc.). See [Ephemeral Volumes](/docs/concepts/storage/ephemeral-volumes/).
- `HugePages`: Enable the allocation and consumption of pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HugePageStorageMediumSize`: Enable support for multiple sizes pre-allocated [huge pages](/docs/tasks/manage-hugepages/scheduling-hugepages/).
- `HyperVContainer`: Enable [Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container) for Windows containers.
Expand Down