Fix CNI issue related to picking up wrong CNI #10985

prezha · 2021-04-05T01:59:16Z

fixes #10984

and also:

helps #7538
fixes #8055
fixes #8480
fixes #8949
fixes #9825
fixes #10044
fixes #10969
(and maybe others...)

examples

given in the original issue description #10984 and also in other issues' description listed above

problem

dns resolution doesn't work in multinode clusters with kindnet cni in pods on nodes where CoreDNS is not present

explanation

as also discussed in #8480, #10384, containers/podman#2370 => cri-o/cri-o#2121, cri-o/cri-o#2411, and specifically in k8s documentation:

Network Plugin Requirements:

The CNI plugin is selected by passing Kubelet the --network-plugin=cni command-line option.
Kubelet reads a file from --cni-conf-dir (default /etc/cni/net.d) and uses the CNI configuration from that file to set up each pod's network.
The CNI configuration file must match the CNI specification, and any required CNI plugins referenced by the configuration must be present in --cni-bin-dir (default /opt/cni/bin).
If there are multiple CNI configuration files in the directory, the kubelet uses the configuration file that comes first by name in lexicographic order.

and also because of:
kubeadm "Init workflow" - step 8:

Installs a DNS server (CoreDNS) and the kube-proxy addon components via the API server. ... Please note that although the DNS server is deployed, it will not be scheduled until CNI is installed.

and so, yes, because eg 87-podman-bridge.conflist already exists in /etc/cni/net.d, it gets picked up by kubelet and during kubeadm init phase addon brings CoreDNS immediately up within the 'wrong' (ie, 10.88.0.x) network
note: other pods (created after cluster is up and the requested cni addon is deployed) will comply and get net from cni and defined Pod CIDR range of 10.244.0.0/24, but will not be able to reach CoreDNS (residing in another network) on the other node (usually - on master node, but can 'migrate' to a minion node after restart)

proposal

idea with this pr is to test alternative to empty/restore of /etc/cni/net.d before/after kubeadm init, proposing to use a separate (empty) directory for cni configuration and thus forcing CoreDNS to wait for cni to initialise first and then get the 'right' ip for itself (btw, while waiting, CoreDNS deployment will be rescaled to 1, so it'll start with one instance only)

this can be achived by adapting cni deployment just by changing the volumes / name: cni-cfg - hostPath / path from default /etc/cni/net.d to eg /etc/cni/net.mk (configurable via cni.CustomCNIConfDir and/or --extra-config=kubelet.cni-conf-dir minikube flag), and supplying this custom path also to kubelet via --cni-conf-dir flag

note: type: DirectoryOrCreate should also be added to cni deployment for this hostPath - if not already there (as is the case with kindnet, but some other cnis have it)

also, in this pr there are a couple of other smaller fixes related to the TestMultiNode/serial/DeployApp2Nodes dns test so it works as expected

multinode-demo

note: i've also updated multinode-demo deployment yaml so that it actually created pods on different nodes (as intended) and updated the example on the website to reflect it

❯ minikube start --nodes 2 -p multinode-demo                                           
😄  [multinode-demo] minikube v1.18.1 on Opensuse-Tumbleweed 
✨  Automatically selected the docker driver
❗  docker is currently using the btrfs storage driver, consider switching to overlay2 for better performance
👍  Starting control plane node multinode-demo in cluster multinode-demo
🔥  Creating docker container (CPUs=2, Memory=8000MB) ...
🐳  Preparing Kubernetes v1.20.2 on Docker 20.10.3 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

👍  Starting node multinode-demo-m02 in cluster multinode-demo
🔥  Creating docker container (CPUs=2, Memory=8000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.49.2
🐳  Preparing Kubernetes v1.20.2 on Docker 20.10.3 ...
    ▪ env NO_PROXY=192.168.49.2
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "multinode-demo" cluster and "default" namespace by default

❯ kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
multinode-demo       Ready    control-plane,master   99s   v1.20.2
multinode-demo-m02   Ready    <none>                 73s   v1.20.2

❯ minikube status -p multinode-demo
multinode-demo
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

multinode-demo-m02
type: Worker
host: Running
kubelet: Running

❯ kubectl apply -f hello-deployment.yaml
deployment.apps/hello created

❯ kubectl rollout status deployment/hello
deployment "hello" successfully rolled out

❯ kubectl apply -f hello-svc.yaml
service/hello created

❯ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
hello-695c67cf9c-bzrzk   1/1     Running   0          22s   10.244.1.2   multinode-demo-m02   <none>           <none>
hello-695c67cf9c-frcvw   1/1     Running   0          22s   10.244.0.3   multinode-demo       <none>           <none>

❯ minikube service list -p multinode-demo
|-------------|------------|--------------|---------------------------|
|  NAMESPACE  |    NAME    | TARGET PORT  |            URL            |
|-------------|------------|--------------|---------------------------|
| default     | hello      |           80 | http://192.168.49.2:31000 |
| default     | kubernetes | No node port |
| kube-system | kube-dns   | No node port |
|-------------|------------|--------------|---------------------------|

❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-frcvw (10.244.0.3)%
❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-bzrzk (10.244.1.2)%
❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-bzrzk (10.244.1.2)%
❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-frcvw (10.244.0.3)%
❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-bzrzk (10.244.1.2)%
❯ curl  http://192.168.49.2:31000
Hello from hello-695c67cf9c-frcvw (10.244.0.3)%

medyagh · 2021-04-05T18:32:50Z

/ok-to-test

minikube-pr-bot · 2021-04-05T19:00:49Z

kvm2 Driver
Times for minikube: 47.8s 47.7s 47.8s
Average time for minikube: 47.8s

Times for Minikube (PR 10985): 49.8s 48.6s 48.5s
Average time for Minikube (PR 10985): 49.0s

Averages Time Per Log

+--------------------------------------------+----------+---------------------+
|                    LOG                     | MINIKUBE | MINIKUBE (PR 10985) |
+--------------------------------------------+----------+---------------------+
| * minikube v1.18.1 on Debian               | 0.0s     | 0.0s                |
| 9.11 (kvm/amd64)                           |          |                     |
| * Using the kvm2 driver based              | 0.0s     | 0.0s                |
| on user configuration                      |          |                     |
| * Starting control plane node              | 0.0s     | 0.0s                |
| minikube in cluster minikube               |          |                     |
| * Creating kvm2 VM (CPUs=2,                | 30.6s    | 30.7s               |
| Memory=4000MB, Disk=20000MB)               |          |                     |
| ...                                        |          |                     |
| * Preparing Kubernetes v1.20.2             | 6.6s     | 1.4s                |
| on Docker 20.10.4 ...                      |          |                     |
|   - Generating certificates                | 0.8s     | 3.3s                |
| and keys ...                               |          |                     |
|   - Booting up control plane               | 7.4s     | 11.0s               |
| ...                                        |          |                     |
|   - Configuring RBAC rules ...             | 1.2s     | 1.3s                |
| * Verifying Kubernetes                     | 0.0s     | 0.0s                |
| components...                              |          |                     |
|   - Using image                            | 0.8s     | 0.9s                |
| gcr.io/k8s-minikube/storage-provisioner:v5 |          |                     |
| * Enabled addons:                          | 0.2s     | 0.3s                |
| default-storageclass,                      |          |                     |
| storage-provisioner                        |          |                     |
| * Done! kubectl is now                     | 0.0s     | 0.0s                |
| configured to use "minikube"               |          |                     |
| cluster and "default"                      |          |                     |
| namespace by default                       |          |                     |
+--------------------------------------------+----------+---------------------+

docker Driver
Times for minikube: 18.3s 17.2s 17.3s
Average time for minikube: 17.6s

Times for Minikube (PR 10985): 18.0s 17.8s 18.0s
Average time for Minikube (PR 10985): 17.9s

Averages Time Per Log

+--------------------------------------------+----------+---------------------+
|                    LOG                     | MINIKUBE | MINIKUBE (PR 10985) |
+--------------------------------------------+----------+---------------------+
| * minikube v1.18.1 on Debian               | 0.1s     | 0.1s                |
| 9.11 (kvm/amd64)                           |          |                     |
| * Using the docker driver                  | 0.1s     | 0.1s                |
| based on user configuration                |          |                     |
| * Starting control plane node              | 0.0s     | 0.0s                |
| minikube in cluster minikube               |          |                     |
| * Creating docker container                | 6.5s     | 6.6s                |
| (CPUs=2, Memory=4000MB) ...                |          |                     |
| * Preparing Kubernetes v1.20.2             | 10.1s    | 10.2s               |
| on Docker 20.10.3 ...                      |          |                     |
| * Verifying Kubernetes                     | 0.1s     | 0.1s                |
| components...                              |          |                     |
|   - Using image                            | 0.6s     | 0.7s                |
| gcr.io/k8s-minikube/storage-provisioner:v5 |          |                     |
| * Enabled addons:                          | 0.0s     | 0.1s                |
| storage-provisioner,                       |          |                     |
| default-storageclass                       |          |                     |
| * Done! kubectl is now                     | 0.0s     | 0.0s                |
| configured to use "minikube"               |          |                     |
| cluster and "default"                      |          |                     |
| namespace by default                       |          |                     |
+--------------------------------------------+----------+---------------------+

medyagh

thank you for this PR This seems to be fixing the issue

k8s-ci-robot · 2021-04-05T20:13:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [medyagh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

michaelhenkel · 2021-04-13T00:19:05Z

this only solves the problem for docker runtime. From kubelet:

  --cni-conf-dir string                                      The full path of the directory in which to search for CNI config files. This docker-specific flag only works when container-runtime is set to docker. (default "/etc/cni/net.d")

I still think the fix provided in #10384 is more generic and solves the problem for all CNIs and container runtimes

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 5, 2021

k8s-ci-robot requested review from RA489 and sharifelgamal April 5, 2021 01:59

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 5, 2021

prezha mentioned this pull request Apr 5, 2021

multinode fails conformance test with kindnet (but calico works) #9825

Closed

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 5, 2021

prezha added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2021

prezha force-pushed the multinode-cni-dns branch from 551473f to a9afc9f Compare April 5, 2021 15:13

prezha added 2 commits April 5, 2021 16:18

multinode: fix kindnet and dns issues

387ad77

fix unrelated "Docs are not updated" issue via generate-docs

d69e24d

prezha force-pushed the multinode-cni-dns branch from bf853eb to d69e24d Compare April 5, 2021 15:18

prezha removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2021

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 5, 2021

kubernetes deleted a comment from minikube-pr-bot Apr 5, 2021

medyagh changed the title ~~multinode: fix kindnet and dns issues~~ Fix CNI issue related to kubeadm Apr 5, 2021

medyagh changed the title ~~Fix CNI issue related to kubeadm~~ Fix CNI issue related to picking up wrong CNI Apr 5, 2021

medyagh approved these changes Apr 5, 2021

View reviewed changes

medyagh merged commit d8b5e28 into kubernetes:master Apr 5, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2021

spowelljr mentioned this pull request Apr 14, 2021

minikube failing to start with containerd and --wait=all flag #11101

Closed

afbjorklund mentioned this pull request Apr 25, 2021

improve how cni and cruntimes work together #11184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CNI issue related to picking up wrong CNI #10985

Fix CNI issue related to picking up wrong CNI #10985

prezha commented Apr 5, 2021 •

edited

Loading

medyagh commented Apr 5, 2021

minikube-pr-bot commented Apr 5, 2021

medyagh left a comment

k8s-ci-robot commented Apr 5, 2021

michaelhenkel commented Apr 13, 2021

Fix CNI issue related to picking up wrong CNI #10985

Fix CNI issue related to picking up wrong CNI #10985

Conversation

prezha commented Apr 5, 2021 • edited Loading

examples

problem

explanation

proposal

multinode-demo

medyagh commented Apr 5, 2021

minikube-pr-bot commented Apr 5, 2021

medyagh left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 5, 2021

michaelhenkel commented Apr 13, 2021

prezha commented Apr 5, 2021 •

edited

Loading