Kubernetes cluster become unresponsive after one node goes down #31553
Labels
kind/support
Categorizes issue or PR as a support question.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
We did setup of k8s cluster on datacenter with 2 master and 5 worker node by using kubeadm for initialization of cluster .Added Cluster config for reference.
kind: InitConfiguration apiVersion: kubeadm.k8s.io/v1beta3 localAPIEndpoint: advertiseAddress: VIP_ADDRESS --- kind: ClusterConfiguration apiVersion: kubeadm.k8s.io/v1beta3 kubernetesVersion: v1.22.0 controlPlaneEndpoint: "MANAGEMENT_VIP_ADDRESS:6444" networking: podSubnet: 192.168.0.0/16 --- kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 0.0.0.0 cgroupDriver: ADVERTISED_CGROUP_DRIVER_NAME shutdownGracePeriod: 6m shutdownGracePeriodCriticalPods: 4m
Version used for diffrent components. etcd:3.5.0-0 kube-apiserver:v1.22 kube-controller-manager:v1.22 kubelet-1.22.1
Issue we do have is once we shutdown node one, then complete server start misbehaving and nodes become read only on most of cases.Even we are not able to run kubectl command.
Getting Exception in kube api log:-
W0128 11:23:25.351294 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.347009 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.352155 1 clientconn.go:1
The text was updated successfully, but these errors were encountered: