Make DNS pod autoscale

DNS service is a very critical service in k8s world, though it's not a part of k8s itself. So it would be nice to have it replicate more than 1 and on differents nodes to have high availbility. Otherwise, services running on k8s cluster will be broken if the node contains DNS pod down. Another sample is, when user would like to do a cluster upgrade, services will be borken when the node containers DNS pod being replaced. You can find lots of discussion about this, please refer [1],[2] and [3]. [1] kubernetes/kubeadm#128 [2] kubernetes/kubernetes#40063 [3] kubernetes/kops#2693 Closes-Bug: #1757554 Change-Id: Ic64569d4bdcf367955398d5badef70e7afe33bbb
openstack · Apr 19, 2018 · 54a4ac9 · 54a4ac9
1 parent 79f4cc0
commit 54a4ac9
Show file tree

Hide file tree

Showing 3 changed files with 123 additions and 2 deletions.
diff --git a/doc/source/user/index.rst b/doc/source/user/index.rst
@@ -1242,6 +1242,27 @@ _`ingress_controller_role`
 
     kubectl label node <node-name> role=ingress
 
+DNS
+---
+
+CoreDNS is a critical service in Kubernetes cluster for service discovery. To
+get high availability for CoreDNS pod for Kubernetes cluster, now Magnum
+supports the autoscaling of CoreDNS using `cluster-proportional-autoscaler
+<https://github.com/kubernetes-incubator/cluster-proportional-autoscaler>`_.
+With cluster-proportional-autoscaler, the replicas of CoreDNS pod will be
+autoscaled based on the nodes and cores in the clsuter to prevent single
+point failure.
+
+The scaling parameters and data points are provided via a ConfigMap to the
+autoscaler and it refreshes its parameters table every poll interval to be up
+to date with the latest desired scaling parameters. Using ConfigMap means user
+can do on-the-fly changes(including control mode) without rebuilding or
+restarting the scaler containers/pods. Please refer `Autoscale the DNS Service
+in a Cluster
+<https://kubernetes.io/docs/tasks/administer-cluster/dns-horizontal-autoscaling/#tuning-autoscaling-parameters>`_
+for more info.
+
+
 Swarm
 =====
 

diff --git a/magnum/drivers/common/templates/kubernetes/fragments/core-dns-service.sh b/magnum/drivers/common/templates/kubernetes/fragments/core-dns-service.sh
@@ -2,7 +2,9 @@
 
 . /etc/sysconfig/heat-params
 
-_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
+_dns_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/coredns/}
+_autoscaler_prefix=${CONTAINER_INFRA_PREFIX:-docker.io/googlecontainer/}
+
 CORE_DNS=/etc/kubernetes/manifests/kube-coredns.yaml
 [ -f ${CORE_DNS} ] || {
     echo "Writing File: $CORE_DNS"
@@ -93,7 +95,7 @@ spec:
           operator: "Exists"
       containers:
       - name: coredns
-        image: ${_prefix}coredns:1.0.1
+        image: ${_dns_prefix}coredns:1.0.1
         imagePullPolicy: Always
         args: [ "-conf", "/etc/coredns/Corefile" ]
         volumeMounts:
@@ -150,6 +152,96 @@ spec:
   - name: metrics
     port: 9153
     protocol: TCP
+---
+kind: ServiceAccount
+apiVersion: v1
+metadata:
+  name: kube-dns-autoscaler
+  namespace: kube-system
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+---
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: system:kube-dns-autoscaler
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+rules:
+  - apiGroups: [""]
+    resources: ["nodes"]
+    verbs: ["list"]
+  - apiGroups: [""]
+    resources: ["replicationcontrollers/scale"]
+    verbs: ["get", "update"]
+  - apiGroups: ["extensions"]
+    resources: ["deployments/scale", "replicasets/scale"]
+    verbs: ["get", "update"]
+# Remove the configmaps rule once below issue is fixed:
+# kubernetes-incubator/cluster-proportional-autoscaler#16
+  - apiGroups: [""]
+    resources: ["configmaps"]
+    verbs: ["get", "create"]
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: system:kube-dns-autoscaler
+  labels:
+    addonmanager.kubernetes.io/mode: Reconcile
+subjects:
+  - kind: ServiceAccount
+    name: kube-dns-autoscaler
+    namespace: kube-system
+roleRef:
+  kind: ClusterRole
+  name: system:kube-dns-autoscaler
+  apiGroup: rbac.authorization.k8s.io
+
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: kube-dns-autoscaler
+  namespace: kube-system
+  labels:
+    k8s-app: kube-dns-autoscaler
+    kubernetes.io/cluster-service: "true"
+    addonmanager.kubernetes.io/mode: Reconcile
+spec:
+  selector:
+    matchLabels:
+      k8s-app: kube-dns-autoscaler
+  template:
+    metadata:
+      labels:
+        k8s-app: kube-dns-autoscaler
+      annotations:
+        scheduler.alpha.kubernetes.io/critical-pod: ''
+    spec:
+      priorityClassName: system-cluster-critical
+      containers:
+      - name: autoscaler
+        image: ${_autoscaler_prefix}cluster-proportional-autoscaler-amd64:1.1.2
+        resources:
+            requests:
+                cpu: "20m"
+                memory: "10Mi"
+        command:
+          - /cluster-proportional-autoscaler
+          - --namespace=kube-system
+          - --configmap=kube-dns-autoscaler
+          # Should keep target in sync with above coredns deployment name
+          - --target=Deployment/coredns
+          # When cluster is using large nodes(with more cores), "coresPerReplica" should dominate.
+          # If using small nodes, "nodesPerReplica" should dominate.
+          - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"preventSinglePointFailure":true}}
+          - --logtostderr=true
+          - --v=2
+      tolerations:
+      - key: "CriticalAddonsOnly"
+        operator: "Exists"
+      serviceAccountName: kube-dns-autoscaler
 EOF
 }
 

diff --git a/releasenotes/notes/dns-autoscale-90b63e3d71d7794e.yaml b/releasenotes/notes/dns-autoscale-90b63e3d71d7794e.yaml
@@ -0,0 +1,8 @@
+---
+issues:
+  - |
+    Currently, the replicas of coreDNS pod is hardcoded as 1. It's not a
+    reasonable number for such a critical service. Without DNS, probably all
+    workloads running on the k8s cluster will be broken. Now Magnum is making
+    the coreDNS pod autoscaling based on the nodes and cores number.
+