The Doc update for ScheduleDaemonSetPods (#8842)

Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
kubernetes · Jun 27, 2018 · e3e750c · e3e750c
1 parent e724168
commit e3e750c
Show file tree

Hide file tree

Showing 3 changed files with 79 additions and 36 deletions.
diff --git a/content/en/docs/concepts/configuration/taint-and-toleration.md b/content/en/docs/concepts/configuration/taint-and-toleration.md
@@ -204,23 +204,23 @@ running on the node as follows
  * pods that tolerate the taint with a specified `tolerationSeconds` remain
    bound for the specified amount of time
 
-In addition, Kubernetes 1.6 has alpha
-support for representing node problems. In other words, the node controller
-automatically taints a node when certain condition is true. The built-in taints
-currently include:
+In addition, Kubernetes 1.6 introduced alpha support for representing node
+problems. In other words, the node controller automatically taints a node when
+certain condition is true. The following taints are built in:
 
  * `node.kubernetes.io/not-ready`: Node is not ready. This corresponds to
    the NodeCondition `Ready` being "`False`".
- * `node.alpha.kubernetes.io/unreachable`: Node is unreachable from the node
+ * `node.kubernetes.io/unreachable`: Node is unreachable from the node
    controller. This corresponds to the NodeCondition `Ready` being "`Unknown`".
  * `node.kubernetes.io/out-of-disk`: Node becomes out of disk.
  * `node.kubernetes.io/memory-pressure`: Node has memory pressure.
  * `node.kubernetes.io/disk-pressure`: Node has disk pressure.
  * `node.kubernetes.io/network-unavailable`: Node's network is unavailable.
- * `node.cloudprovider.kubernetes.io/uninitialized`: When kubelet is started
-   with "external" cloud provider, it sets this taint on a node to mark it
-   as unusable. When a controller from the cloud-controller-manager initializes
-   this node, kubelet removes this taint.
+ * `node.kubernetes.io/unschedulable`: Node is unschedulable.
+ * `node.cloudprovider.kubernetes.io/uninitialized`: When the kubelet is started
+    with "external" cloud provider, this taint is set on a node to mark it
+    as unusable. After a controller from the cloud-controller-manager initializes
+    this node, the kubelet removes this taint.
 
 When the `TaintBasedEvictions` alpha feature is enabled (you can do this by
 including `TaintBasedEvictions=true` in `--feature-gates` for Kubernetes controller manager,
@@ -277,17 +277,15 @@ Version 1.8 introduces an alpha feature that causes the node controller to creat
 Node conditions. When this feature is enabled (you can do this by including `TaintNodesByCondition=true` in the `--feature-gates` command line flag to the scheduler, such as
 `--feature-gates=FooBar=true,TaintNodesByCondition=true`), the scheduler does not check Node conditions; instead the scheduler checks taints. This assures that Node conditions don't affect what's scheduled onto the Node. The user can choose to ignore some of the Node's problems (represented as Node conditions) by adding appropriate Pod tolerations.
 
-To make sure that turning on this feature doesn't break DaemonSets, starting in version 1.8, the  DaemonSet controller automatically adds the following `NoSchedule` tolerations to all daemons:
+Starting in Kubernetes 1.8, the DaemonSet controller automatically adds the
+following `NoSchedule` tolerations to all daemons, to prevent DaemonSets from
+breaking.
 
   * `node.kubernetes.io/memory-pressure`
   * `node.kubernetes.io/disk-pressure`
   * `node.kubernetes.io/out-of-disk` (*only for critical pods*)
+  * `node.kubernetes.io/unschedulable` (1.10 or later)
+  * `node.kubernetes.io/network-unavailable` (*host network only*)
 
-The above settings ensure backward compatibility, but we understand they may not fit all user's needs, which is why
-cluster admin may choose to add arbitrary tolerations to DaemonSets.
-
-{{% /capture %}}
-
-{{% capture whatsnext %}}
-
-{{% /capture %}}
+Adding these tolerations ensures backward compatibility. You can also add
+arbitrary tolerations to DaemonSets.
diff --git a/content/en/docs/concepts/workloads/controllers/daemonset.md b/content/en/docs/concepts/workloads/controllers/daemonset.md
@@ -103,7 +103,9 @@ If you do not specify either, then the DaemonSet controller will create Pods on
 
 ## How Daemon Pods are Scheduled
 
-Normally, the machine that a Pod runs on is selected by the Kubernetes scheduler.  However, Pods
+### Scheduled by DaemonSet controller (default)
+
+Normally, the machine that a Pod runs on is selected by the Kubernetes scheduler. However, Pods
 created by the DaemonSet controller have the machine already selected (`.spec.nodeName` is specified
 when the Pod is created, so it is ignored by the scheduler).  Therefore:
 
@@ -112,29 +114,72 @@ when the Pod is created, so it is ignored by the scheduler).  Therefore:
  - The DaemonSet controller can make Pods even when the scheduler has not been started, which can help cluster
    bootstrap.
 
-Daemon Pods do respect [taints and tolerations](/docs/concepts/configuration/taint-and-toleration),
-but they are created with `NoExecute` tolerations for the following taints with no `tolerationSeconds`:
 
- - `node.kubernetes.io/not-ready`
- - `node.alpha.kubernetes.io/unreachable`
+### Scheduled by default scheduler
+
+{{< feature-state state="alpha" for-kubernetes-version="1.11" >}}
+
+A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
+node that a Pod runs on is selected by the Kubernetes scheduler. However,
+DaemonSet pods are created and scheduled by the DaemonSet controller instead.
+That introduces the following issues:
+
+ * Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
+   and in `Pending` state, but DaemonSet pods are not created in `Pending`
+   state. This is confusing to the user.
+ * [Pod preemption](/docs/concepts/configuration/pod-priority-preemption/)
+   is handled by default scheduler. When preemption is enabled, the DaemonSet controller
+   will make scheduling decisions without considering pod priority and preemption.
+
+`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
+scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
+to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
+scheduler is then used to bind the pod to the target host. If node affinity of
+the DaemonSet pod already exists, it is replaced. The DaemonSet controller only
+performs these operations when creating or modifying DaemonSet pods, and no
+changes are made to the `spec.template` of the DaemonSet.
+
+```yaml
+nodeAffinity:
+  requiredDuringSchedulingIgnoredDuringExecution:
+    nodeSelectorTerms:
+    - matchFields:
+      - key: metadata.name
+        operator: In
+        values:
+        - target-host-name
+```
+
+In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
+automatically to DaemonSet Pods. The DaemonSet controller ignores
+`unschedulable` Nodes when scheduling DaemonSet Pods. You must enable
+`TaintModesByCondition` to ensure that the default scheduler behaves the same
+way and schedules DaemonSet pods on `unschedulable` nodes.
+
+When this feature and `TaintNodesByCondition` are enabled together, if DaemonSet
+uses the host network, you must also add the
+`node.kubernetes.io/network-unavailable:NoSchedule toleration`.
+
 
-This ensures that when the `TaintBasedEvictions` alpha feature is enabled,
-they will not be evicted when there are node problems such as a network partition. (When the
-`TaintBasedEvictions` feature is not enabled, they are also not evicted in these scenarios, but
-due to hard-coded behavior of the NodeController rather than due to tolerations).
+### Taints and Tolerations
 
- They also tolerate following `NoSchedule` taints:
+Although Daemon Pods respect
+[taints and tolerations](/docs/concepts/configuration/taint-and-toleration),
+the following tolerations are added to DamonSet Pods automatically according to
+the related features.
 
- - `node.kubernetes.io/memory-pressure`
- - `node.kubernetes.io/disk-pressure`
+| Toleration Key                           | Effect     | Alpha Features                                               | Version | Description                                                  |
+| ---------------------------------------- | ---------- | ------------------------------------------------------------ | ------- | ------------------------------------------------------------ |
+| `node.kubernetes.io/not-ready`           | NoExecute  | `TaintBasedEvictions`                                        | 1.8+    | when `TaintBasedEvictions`  is enabled,they will not be evicted when there are node problems such as a network partition. |
+| `node.kubernetes.io/unreachable`         | NoExecute  | `TaintBasedEvictions`                                        | 1.8+    | when `TaintBasedEvictions`  is enabled,they will not be evicted when there are node problems such as a network partition. |
+| `node.kubernetes.io/disk-pressure`       | NoSchedule | `TaintNodesByCondition`                                      | 1.8+    |                                                              |
+| `node.kubernetes.io/memory-pressure`     | NoSchedule | `TaintNodesByCondition`                                      | 1.8+    |                                                              |
+| `node.kubernetes.io/unschedulable`       | NoSchedule | `ScheduleDaemonSetPods`, `TaintNodesByCondition`             | 1.11+   | When ` ScheduleDaemonSetPods` is enabled, ` TaintNodesByCondition` is necessary to make sure DaemonSet pods tolerate unschedulable attributes by default scheduler. |
+| `node.kubernetes.io/network-unavailable` | NoSchedule | `ScheduleDaemonSetPods`, `TaintNodesByCondition`, hostnework | 1.11+   | When ` ScheduleDaemonSetPods` is enabled, ` TaintNodesByCondition` is necessary to make sure DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
+| `node.kubernetes.io/out-of-disk`         | NoSchedule | `ExperimentalCriticalPodAnnotation` (critical pod only), `TaintNodesByCondition` | 1.8+    |                                                              |
 
-When the support to critical pods is enabled and the pods in a DaemonSet are
-labeled as critical, the Daemon pods are created with an additional
-`NoSchedule` toleration for the `node.kubernetes.io/out-of-disk` taint.
 
-Note that all above `NoSchedule` taints above are created only in version 1.8 or later if the alpha feature `TaintNodesByCondition` is enabled.
 
-Also note that the `node-role.kubernetes.io/master` `NoSchedule` toleration specified in the above example is needed on 1.6 or later to schedule on *master* nodes as this is not a default toleration.
 
 ## Communicating with Daemon Pods
 

diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md
@@ -81,7 +81,6 @@ different Kubernetes components.
 | `RotateKubeletClientCertificate` | `true` | Beta | 1.7 | |
 | `RotateKubeletServerCertificate` | `false` | Alpha | 1.7 | |
 | `RunAsGroup` | `false` | Alpha | 1.10 | |
-| `ScheduleDaemonSetPods` | `false` | Alpha | 1.10 | |
 | `ServiceNodeExclusion` | `false` | Alpha | 1.8 | |
 | `StorageObjectInUseProtection` | `true` | Beta | 1.10 | 1.10 |
 | `StorageObjectInUseProtection` | `true` | GA | 1.11 | |
@@ -98,6 +97,7 @@ different Kubernetes components.
 | `VolumeScheduling` | `false` | Alpha | 1.9 | 1.9 |
 | `VolumeScheduling` | `true` | Beta | 1.10 | |
 | `VolumeSubpathEnvExpansion` | `false` | Alpha | 1.11 | |
+| `ScheduleDaemonSetPods` | `false` | Alpha | 1.11 | |
 
 ## Using a Feature