You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a backport issue for #4145, automatically created via rancherbot by @rbrtbnfgl
Original issue description:
Environmental Info:
RKE2 Version: v1.26.4+rke2r1
Node(s) CPU architecture, OS, and Version: Linux k8s-agent16 5.15.0-70-generic #77-Ubuntu SMP Tue Mar 21 14:02:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
2 servers, 16 agents all running Ubuntu 22.04
Describe the bug:
rke2-canal pods on some agents are not starting. The pod logs contain the following.
2023-04-27 12:01:53.719 [WARNING][2437501] felix/ipsets.go 319: Failed to resync with dataplane error=exit status 1 family="inet"
2023-04-27 12:01:53.752 [INFO][2437501] felix/ipsets.go 309: Retrying after an ipsets update failure... family="inet"
2023-04-27 12:01:53.753 [ERROR][2437501] felix/ipsets.go 569: Bad return code from 'ipset list'. error=exit status 1 family="inet" stderr="ipset v6.36: Kernel support protocol versions 6-7 while userspace supports protocol versions 6-6\nKernel and userspace incompatible: settype hash:net with revision 7 not supported by userspace.\n"
There was a similar issue reported at projectcalico/calico#5011. But, it's mentioned that it only happens if kube-proxy mode is ipvs and it shouldn't impact if the proxy-mode is iptables. I have confirmed that the proxy-mode is iptables. Here are the logs from the kube-proxy pod.
I0424 19:10:54.248089 1 server.go:224] "Warning, all flags other than --config, --write-config-to, and --cleanup are deprecated, please begin using a config file ASAP"
I0424 19:10:54.257660 1 node.go:163] Successfully retrieved node IP: 192.168.39.77
I0424 19:10:54.257687 1 server_others.go:109] "Detected node IP" address="192.168.39.77"
I0424 19:10:54.294553 1 server_others.go:176] "Using iptables Proxier"
I0424 19:10:54.294622 1 server_others.go:183] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0424 19:10:54.294646 1 server_others.go:184] "Creating dualStackProxier for iptables"
I0424 19:10:54.294680 1 server_others.go:465] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0424 19:10:54.294748 1 proxier.go:242] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0424 19:10:54.295311 1 server.go:655] "Version info" version="v1.26.4+rke2r1"
I0424 19:10:54.295341 1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0424 19:10:54.296172 1 config.go:226] "Starting endpoint slice config controller"
I0424 19:10:54.296195 1 shared_informer.go:270] Waiting for caches to sync for endpoint slice config
I0424 19:10:54.296235 1 config.go:444] "Starting node config controller"
I0424 19:10:54.296255 1 shared_informer.go:270] Waiting for caches to sync for node config
I0424 19:10:54.296254 1 config.go:317] "Starting service config controller"
I0424 19:10:54.296275 1 shared_informer.go:270] Waiting for caches to sync for service config
I0424 19:10:54.397015 1 shared_informer.go:277] Caches are synced for node config
I0424 19:10:54.397063 1 shared_informer.go:277] Caches are synced for endpoint slice config
I0424 19:10:54.397175 1 shared_informer.go:277] Caches are synced for service config
E0425 16:30:24.197986 1 service_health.go:187] "Healthcheck closed" err="accept tcp [::]:32666: use of closed network connection" service="istio-system/istio-ingressgateway"
E0425 16:30:24.198068 1 service_health.go:187] "Healthcheck closed" err="accept tcp [::]:32675: use of closed network connection" service="istio-system/istio-internal-ingressgateway"
Steps To Reproduce:
Installed RKE2 using the following steps
sudo swapoff -a
hostnamectl set-hostname k8s-master01
# add the master node details in every node
vi /etc/hosts
192.168.39.5 k8s-master01
# kubectl install on Debian based distributions
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubectl
#network_bridges
sudo tee -a /etc/sysctl.d/99-kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
cat >>/etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=latest sh -
# first server node
systemctl enable rke2-server
systemctl start rke2-server
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin
cat /var/lib/rancher/rke2/server/node-token
# second server node
mkdir -p /etc/rancher/rke2
vi /etc/rancher/rke2/config.yaml
server: https://192.168.39.2:9345
token: <TOKEN_FROM_THE_ABOVE_CAT_COMMAND>
systemctl enable rke2-server
systemctl start rke2-server
# agent node
mkdir -p /etc/rancher/rke2
vi /etc/rancher/rke2/config.yaml
server: https://192.168.39.2:9345
token: <TOKEN_FROM_THE_ABOVE_CAT_COMMAND>
systemctl enable rke2-agent
systemctl start rke2-agent
Expected behavior:
Running kubectl get pod -n kube-system should result in all rke2-canal pods running successfully.
Actual behavior:
Some of the rke2-canal are stuck at Ready 1/2
Additional context / logs:
On host:
# ipset version
ipset v7.15, protocol version: 7
On rke2-canal pod and calico-node container running in the same host:
sh-4.4# ipset version
ipset v6.36, protocol version: 6
Note that this behavior is observed only on one server node and one agent node. All other nodes are working fine. One common thing on both these nodes is that the output of ipset list contained sets with Revision: 7 in them.
Output of ipset list from the problematic agent node:
Name: cali40all-ipam-pools
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x54a33cf9
Size in memory: 504
References: 0
Number of entries: 1
Members:
10.42.0.0/16
Name: cali40masq-ipam-pools
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0xdd3fa3ae
Size in memory: 504
References: 0
Number of entries: 1
Members:
10.42.0.0/16
Name: cali40this-host
Type: hash:ip
Revision: 5
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x2d605c42
Size in memory: 360
References: 0
Number of entries: 4
Members:
192.168.39.77
10.42.180.64
127.0.0.1
127.0.0.0
Name: cali40all-vxlan-net
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x17e7d094
Size in memory: 1320
References: 0
Number of entries: 18
Members:
192.168.39.99
192.168.39.91
192.168.39.154
192.168.39.72
192.168.39.3
192.168.39.79
192.168.39.98
192.168.39.74
192.168.39.78
192.168.39.96
192.168.39.92
192.168.39.94
192.168.39.151
192.168.39.93
192.168.39.2
192.168.39.1
192.168.39.97
192.168.39.95
Output of ipset list from the node that is working fine:
Name: cali40all-ipam-pools
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 504
References: 1
Number of entries: 1
Members:
172.16.0.0/16
Name: cali40masq-ipam-pools
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 504
References: 1
Number of entries: 1
Members:
172.16.0.0/16
Name: cali40this-host
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 360
References: 0
Number of entries: 4
Members:
10.42.30.0
127.0.0.1
192.168.39.96
127.0.0.0
The text was updated successfully, but these errors were encountered:
This is a backport issue for #4145, automatically created via rancherbot by @rbrtbnfgl
Original issue description:
Environmental Info:
RKE2 Version:
v1.26.4+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux k8s-agent16 5.15.0-70-generic #77-Ubuntu SMP Tue Mar 21 14:02:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
2 servers, 16 agents all running Ubuntu 22.04
Describe the bug:
rke2-canal pods on some agents are not starting. The pod logs contain the following.
There was a similar issue reported at projectcalico/calico#5011. But, it's mentioned that it only happens if kube-proxy mode is ipvs and it shouldn't impact if the proxy-mode is iptables. I have confirmed that the proxy-mode is iptables. Here are the logs from the kube-proxy pod.
Steps To Reproduce:
Installed RKE2 using the following steps
Expected behavior:
Running
kubectl get pod -n kube-system
should result in allrke2-canal
pods running successfully.Actual behavior:
Some of the
rke2-canal
are stuck atReady 1/2
Additional context / logs:
On host:
On
rke2-canal
pod andcalico-node
container running in the same host:Note that this behavior is observed only on one server node and one agent node. All other nodes are working fine. One common thing on both these nodes is that the output of
ipset list
contained sets withRevision: 7
in them.Output of
ipset list
from the problematic agent node:Output of
ipset list
from the node that is working fine:The text was updated successfully, but these errors were encountered: