Kubernetes cluster become unresponsive after one node goes down #31553

mayanksinghse · 2022-01-28T12:13:46Z

We did setup of k8s cluster on datacenter with 2 master and 5 worker node by using kubeadm for initialization of cluster .Added Cluster config for reference.

kind: InitConfiguration apiVersion: kubeadm.k8s.io/v1beta3 localAPIEndpoint: advertiseAddress: VIP_ADDRESS --- kind: ClusterConfiguration apiVersion: kubeadm.k8s.io/v1beta3 kubernetesVersion: v1.22.0 controlPlaneEndpoint: "MANAGEMENT_VIP_ADDRESS:6444" networking: podSubnet: 192.168.0.0/16 --- kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 0.0.0.0 cgroupDriver: ADVERTISED_CGROUP_DRIVER_NAME shutdownGracePeriod: 6m shutdownGracePeriodCriticalPods: 4m

Version used for diffrent components. etcd:3.5.0-0 kube-apiserver:v1.22 kube-controller-manager:v1.22 kubelet-1.22.1

Issue we do have is once we shutdown node one, then complete server start misbehaving and nodes become read only on most of cases.Even we are not able to run kubectl command.

Getting Exception in kube api log:-

W0128 11:23:25.351294 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.347009 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.352155 1 clientconn.go:1

k8s-ci-robot · 2022-01-28T12:13:52Z

@mayanksinghse: This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2022-01-28T12:50:49Z

You might be seeing kubernetes/kubeadm#2567 Which is a missing etcd feature for checking single members vs whole cluster health. Kubeadm is waiting on core k8s to support a version of etcd that has it. You can comment on that issue. But also, even numbers of CP nodes do not make sense. You need min 3. Check what quorum means in the etcd docs. This is not a k/website issue. /kind support /close

k8s-ci-robot · 2022-01-28T12:51:09Z

@neolit123: Closing this issue.

In response to this:

You might be seeing kubernetes/kubeadm#2567
Which is a missing etcd feature for checking single members vs whole
cluster health. Kubeadm is waiting on core k8s to support a version of etcd
that has it.

You can comment on that issue.

But also, even numbers of CP nodes do not make sense. You need min 3. Check
what quorum means in the etcd docs.

This is not a k/website issue.
/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 28, 2022

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jan 28, 2022

k8s-ci-robot closed this as completed Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes cluster become unresponsive after one node goes down #31553

Kubernetes cluster become unresponsive after one node goes down #31553

mayanksinghse commented Jan 28, 2022

k8s-ci-robot commented Jan 28, 2022

neolit123 commented Jan 28, 2022 via email

k8s-ci-robot commented Jan 28, 2022

Kubernetes cluster become unresponsive after one node goes down #31553

Kubernetes cluster become unresponsive after one node goes down #31553

Comments

mayanksinghse commented Jan 28, 2022

k8s-ci-robot commented Jan 28, 2022

neolit123 commented Jan 28, 2022 via email

k8s-ci-robot commented Jan 28, 2022