-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent losing connection when there is a leader change and a disk pressure #6304
Comments
The apiserver does not have a leader, only controllers have leaders. etcd also has a leader, but this is transparent to etcd clients. Which specific controller or lease are you seeing associated with this behavior? |
@brandond bad phrasing, the agent looses connection to the (leading? not sure of that point) proxy and does not reconnect to another one (also note the typo "reconecting"). It keeps retrying...
|
Are the logs from the If it is refusing connections, then the rke2-server service on that node is not running for some reason. Is it crashing? |
@brandond Sadly, we did not have .239 debug enabled. I suspect that the api server run out of memory and rebooted, leading to a server change (from kube-vip), but just temporary as when we intervened all the servers were healthy. |
The logs from this agent only go back a few minutes prior to when it disconnected from the server. Do you have logs going back to the previous start of the service? |
@brandond full log are here agent-lost.log.zip. |
What was the actual sequence of events here, on the other servers? I see the agent getting disconnected from that server here:
However that does not trigger any failover of the apiserver load-balancer, as there were no active connections to that node when it failed. The load-balancer had failed over to a different server almost 8 days earlier:
There was a bunch of thrashing for a few minutes before that, where I can't really make heads or tails of it, without knowing what was going on with these servers at the time. |
@brandond Thanks for the insight. The restart was a manual intervention by us. The turn was likely the api server getting oom. Let me try to get the logs for the whole cluster on the next occurence. |
What made you decide to restart it at that point? |
If your server nodes are under memory pressure, you might consider adding some reservations for your critical pods, via the We don't set these by default, as the required resources are highly environment specific. You should baseline your current utilization, and then set the appropriate requests and limits. |
Also, if possible - please update to v1.30.2 when you get a chance, it is possible you're running into k3s-io/k3s#10279 |
@zifeo based on the logs it looks like you're using a load-balancer (192.168.42.4) as the fixed registration address (--server address) for your nodes. Is that correct? How are you hosting this endpoint? Are you by any chance using kube-vip or metallb to expose a Kubernetes service at this address? |
|
OK. So far kube-vip appears to be the common denominator between this issue and #6208 - so I think we're running into the same thing. As discussed at #6208 (comment) it seems like perhaps kube-proxy's iptables rules may be interfering with connections to the VIP and preventing failover to a new endpoint. |
@brandond One more item, we are using cilium in strict kube-proxy replacement. |
I believe cilium will do the same thing as kube-proxy with regards to locally redirecting loadbalancer service IP traffic. I added some comments to the other issue regarding a beta Kubernetes feature that can be used to disable it when using kube-proxy. I don't know if there is any similar way to disable that when using cilium's kube-proxy replacement. |
@brandond Is there a name for this in kube-proxy so I can investigate on cilium side? Note that this never happened before 1.30 also. |
LoadBalancerIPMode is the FeatureGate, here is the KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1860-kube-proxy-IP-node-binding |
@brandond Thanks for the insights. After looking into it, I am not quite sure to follow. As far as I understand, kube-vip manipulates its vip outside of Kubernetes with arp broadcast, giving an ip outside of the cluster ranges to point at a leading server and then that leader will forward the request using IPVS. Can you explain a bit more how that could be linked to the KEP-1860? |
kube-vip and other LoadBalancer controllers put the VIP address in the That KEP allows for disabling that kube-proxy behavior with a new field in the |
tl;dr the arp broadcast for the VIP doesn't matter to cluster members, because kube-proxy or cilium's kube-proxy replacement bypass the VIP entirely. The VIP is only used outside the cluster, or before kube-proxy or cilium are running. |
@brandond Oh, I see. This should only happen when you use kube-vip to manage the load-balancing of services. Actually, this is disabled on our end and we only use the "service-less" control plane load-balancing so outside of the cluster. |
Can you provide more details on your kube-vip deployment, including the yaml spec of the service that is hosting that VIP? See the info provided at #6208 (comment) as an example. So far, kube-vip is the primary difference that is common to these two environments, that we do not generally use when testing RKE2. |
@brandond There is no service, only the following static pod on each servers which is moved at startup time to |
OK. That's interesting. I'll have to try with that as well. So far kube-vip is the only thing common to both environments, regardless of configuration. |
OK, so with that kube-vip manifest I was able to find the issue. It is not kube-vip's fault, but for some reason I was able to reproduce the issue while using kube-vip, when I previously had not been able to do so. It is the same thing as #6208, so I am going to close this out and follow up there. |
Environmental Info:
RKE2 Version: v1.30.1+rke2r1
Node(s) CPU architecture, OS, and Version: x86, Ubuntu Jammy
Cluster Configuration: 3 servers, 3 agents
Describe the bug:
This is likely something similar to #5949. While troubleshooting further, it seems that an agent under disk pressure might not re-balance or re-connect to the new leading api server correctly.
Steps To Reproduce:
Setup a cluster with 3 servers and 3 agents, check which IP is first in rke2-agent-load-balancer.json, create some disk pressure on the agent (e.g. too many images to be schedule on that node) and remove the leading server and ensure the replacement node has a different IP. The agent will then loose connection.
Expected behavior:
Correctly load-balancing and reconnecting to the new leader when back up from disk pressure.
Actual behavior:
The agent needs to be manually restarted for the connection to restore.
Additional context / logs:
log.zip
The text was updated successfully, but these errors were encountered: