Skip to content

Commit

Permalink
The Doc update for ScheduleDaemonSetPods (#8842)
Browse files Browse the repository at this point in the history
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
  • Loading branch information
k82cn authored and Misty Linville committed Jun 27, 2018
1 parent e724168 commit e3e750c
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 36 deletions.
34 changes: 16 additions & 18 deletions content/en/docs/concepts/configuration/taint-and-toleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,23 +204,23 @@ running on the node as follows
* pods that tolerate the taint with a specified `tolerationSeconds` remain
bound for the specified amount of time

In addition, Kubernetes 1.6 has alpha
support for representing node problems. In other words, the node controller
automatically taints a node when certain condition is true. The built-in taints
currently include:
In addition, Kubernetes 1.6 introduced alpha support for representing node
problems. In other words, the node controller automatically taints a node when
certain condition is true. The following taints are built in:

* `node.kubernetes.io/not-ready`: Node is not ready. This corresponds to
the NodeCondition `Ready` being "`False`".
* `node.alpha.kubernetes.io/unreachable`: Node is unreachable from the node
* `node.kubernetes.io/unreachable`: Node is unreachable from the node
controller. This corresponds to the NodeCondition `Ready` being "`Unknown`".
* `node.kubernetes.io/out-of-disk`: Node becomes out of disk.
* `node.kubernetes.io/memory-pressure`: Node has memory pressure.
* `node.kubernetes.io/disk-pressure`: Node has disk pressure.
* `node.kubernetes.io/network-unavailable`: Node's network is unavailable.
* `node.cloudprovider.kubernetes.io/uninitialized`: When kubelet is started
with "external" cloud provider, it sets this taint on a node to mark it
as unusable. When a controller from the cloud-controller-manager initializes
this node, kubelet removes this taint.
* `node.kubernetes.io/unschedulable`: Node is unschedulable.
* `node.cloudprovider.kubernetes.io/uninitialized`: When the kubelet is started
with "external" cloud provider, this taint is set on a node to mark it
as unusable. After a controller from the cloud-controller-manager initializes
this node, the kubelet removes this taint.

When the `TaintBasedEvictions` alpha feature is enabled (you can do this by
including `TaintBasedEvictions=true` in `--feature-gates` for Kubernetes controller manager,
Expand Down Expand Up @@ -277,17 +277,15 @@ Version 1.8 introduces an alpha feature that causes the node controller to creat
Node conditions. When this feature is enabled (you can do this by including `TaintNodesByCondition=true` in the `--feature-gates` command line flag to the scheduler, such as
`--feature-gates=FooBar=true,TaintNodesByCondition=true`), the scheduler does not check Node conditions; instead the scheduler checks taints. This assures that Node conditions don't affect what's scheduled onto the Node. The user can choose to ignore some of the Node's problems (represented as Node conditions) by adding appropriate Pod tolerations.

To make sure that turning on this feature doesn't break DaemonSets, starting in version 1.8, the DaemonSet controller automatically adds the following `NoSchedule` tolerations to all daemons:
Starting in Kubernetes 1.8, the DaemonSet controller automatically adds the
following `NoSchedule` tolerations to all daemons, to prevent DaemonSets from
breaking.

* `node.kubernetes.io/memory-pressure`
* `node.kubernetes.io/disk-pressure`
* `node.kubernetes.io/out-of-disk` (*only for critical pods*)
* `node.kubernetes.io/unschedulable` (1.10 or later)
* `node.kubernetes.io/network-unavailable` (*host network only*)

The above settings ensure backward compatibility, but we understand they may not fit all user's needs, which is why
cluster admin may choose to add arbitrary tolerations to DaemonSets.

{{% /capture %}}

{{% capture whatsnext %}}

{{% /capture %}}
Adding these tolerations ensures backward compatibility. You can also add
arbitrary tolerations to DaemonSets.
79 changes: 62 additions & 17 deletions content/en/docs/concepts/workloads/controllers/daemonset.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,9 @@ If you do not specify either, then the DaemonSet controller will create Pods on

## How Daemon Pods are Scheduled

Normally, the machine that a Pod runs on is selected by the Kubernetes scheduler. However, Pods
### Scheduled by DaemonSet controller (default)

Normally, the machine that a Pod runs on is selected by the Kubernetes scheduler. However, Pods
created by the DaemonSet controller have the machine already selected (`.spec.nodeName` is specified
when the Pod is created, so it is ignored by the scheduler). Therefore:

Expand All @@ -112,29 +114,72 @@ when the Pod is created, so it is ignored by the scheduler). Therefore:
- The DaemonSet controller can make Pods even when the scheduler has not been started, which can help cluster
bootstrap.

Daemon Pods do respect [taints and tolerations](/docs/concepts/configuration/taint-and-toleration),
but they are created with `NoExecute` tolerations for the following taints with no `tolerationSeconds`:

- `node.kubernetes.io/not-ready`
- `node.alpha.kubernetes.io/unreachable`
### Scheduled by default scheduler

{{< feature-state state="alpha" for-kubernetes-version="1.11" >}}

A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
node that a Pod runs on is selected by the Kubernetes scheduler. However,
DaemonSet pods are created and scheduled by the DaemonSet controller instead.
That introduces the following issues:

* Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
and in `Pending` state, but DaemonSet pods are not created in `Pending`
state. This is confusing to the user.
* [Pod preemption](/docs/concepts/configuration/pod-priority-preemption/)
is handled by default scheduler. When preemption is enabled, the DaemonSet controller
will make scheduling decisions without considering pod priority and preemption.

`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
scheduler is then used to bind the pod to the target host. If node affinity of
the DaemonSet pod already exists, it is replaced. The DaemonSet controller only
performs these operations when creating or modifying DaemonSet pods, and no
changes are made to the `spec.template` of the DaemonSet.

```yaml
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- target-host-name
```
In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
automatically to DaemonSet Pods. The DaemonSet controller ignores
`unschedulable` Nodes when scheduling DaemonSet Pods. You must enable
`TaintModesByCondition` to ensure that the default scheduler behaves the same
way and schedules DaemonSet pods on `unschedulable` nodes.

When this feature and `TaintNodesByCondition` are enabled together, if DaemonSet
uses the host network, you must also add the
`node.kubernetes.io/network-unavailable:NoSchedule toleration`.


This ensures that when the `TaintBasedEvictions` alpha feature is enabled,
they will not be evicted when there are node problems such as a network partition. (When the
`TaintBasedEvictions` feature is not enabled, they are also not evicted in these scenarios, but
due to hard-coded behavior of the NodeController rather than due to tolerations).
### Taints and Tolerations

They also tolerate following `NoSchedule` taints:
Although Daemon Pods respect
[taints and tolerations](/docs/concepts/configuration/taint-and-toleration),
the following tolerations are added to DamonSet Pods automatically according to
the related features.

- `node.kubernetes.io/memory-pressure`
- `node.kubernetes.io/disk-pressure`
| Toleration Key | Effect | Alpha Features | Version | Description |
| ---------------------------------------- | ---------- | ------------------------------------------------------------ | ------- | ------------------------------------------------------------ |
| `node.kubernetes.io/not-ready` | NoExecute | `TaintBasedEvictions` | 1.8+ | when `TaintBasedEvictions` is enabled,they will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/unreachable` | NoExecute | `TaintBasedEvictions` | 1.8+ | when `TaintBasedEvictions` is enabled,they will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/disk-pressure` | NoSchedule | `TaintNodesByCondition` | 1.8+ | |
| `node.kubernetes.io/memory-pressure` | NoSchedule | `TaintNodesByCondition` | 1.8+ | |
| `node.kubernetes.io/unschedulable` | NoSchedule | `ScheduleDaemonSetPods`, `TaintNodesByCondition` | 1.11+ | When ` ScheduleDaemonSetPods` is enabled, ` TaintNodesByCondition` is necessary to make sure DaemonSet pods tolerate unschedulable attributes by default scheduler. |
| `node.kubernetes.io/network-unavailable` | NoSchedule | `ScheduleDaemonSetPods`, `TaintNodesByCondition`, hostnework | 1.11+ | When ` ScheduleDaemonSetPods` is enabled, ` TaintNodesByCondition` is necessary to make sure DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
| `node.kubernetes.io/out-of-disk` | NoSchedule | `ExperimentalCriticalPodAnnotation` (critical pod only), `TaintNodesByCondition` | 1.8+ | |

When the support to critical pods is enabled and the pods in a DaemonSet are
labeled as critical, the Daemon pods are created with an additional
`NoSchedule` toleration for the `node.kubernetes.io/out-of-disk` taint.

Note that all above `NoSchedule` taints above are created only in version 1.8 or later if the alpha feature `TaintNodesByCondition` is enabled.

Also note that the `node-role.kubernetes.io/master` `NoSchedule` toleration specified in the above example is needed on 1.6 or later to schedule on *master* nodes as this is not a default toleration.

## Communicating with Daemon Pods

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ different Kubernetes components.
| `RotateKubeletClientCertificate` | `true` | Beta | 1.7 | |
| `RotateKubeletServerCertificate` | `false` | Alpha | 1.7 | |
| `RunAsGroup` | `false` | Alpha | 1.10 | |
| `ScheduleDaemonSetPods` | `false` | Alpha | 1.10 | |
| `ServiceNodeExclusion` | `false` | Alpha | 1.8 | |
| `StorageObjectInUseProtection` | `true` | Beta | 1.10 | 1.10 |
| `StorageObjectInUseProtection` | `true` | GA | 1.11 | |
Expand All @@ -98,6 +97,7 @@ different Kubernetes components.
| `VolumeScheduling` | `false` | Alpha | 1.9 | 1.9 |
| `VolumeScheduling` | `true` | Beta | 1.10 | |
| `VolumeSubpathEnvExpansion` | `false` | Alpha | 1.11 | |
| `ScheduleDaemonSetPods` | `false` | Alpha | 1.11 | |

## Using a Feature

Expand Down

0 comments on commit e3e750c

Please sign in to comment.