-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrading etcd version to 3.3.15, doesn't fix API Server crash issue #11078
Comments
test etcd 3.4 release,meet the same issue :
|
For kubernetes, a workaround that we found was (assuming the apiserver replicas all have a collocated etcd cluster member) to customise the endpoints list that you give to each apiserver to connect to etcd cluster: always put the collocated etcd member first in the list. Regarding 3.4, I saw "Improvements to client balancer failover logic" in the announcement. Perhaps k8s apiserver needs to be upgraded to embed the new 3.4 client library in order to fix this. It's not enough to upgrade the server, as it's the client that is mostly responsible for this. |
The fix was back ported to v3.3.14+. Kubernestes 1.16 will include etcd client v3.3.15 [1]. The decision was made to not back port the fix to prior kubernetes versions. If you need a hot fix, try [2]. [1] kubernetes/kubernetes#81434 Closing because the issue was already addressed in kubernetes/kubernetes#72102. |
We are experiencing the same issue as API Server is crashing out every time the first etcd server goes down. Earlier using etcd v3.3.13 and upgraded this to v3.3.15 after coming across this PR and kubernetes/kubernetes#72102.
But I still see that apiserver is crashing out.After spending some time to make sure I'm using the proper versions, decided to take help from other folks here. I believe I might be missing something. Please correct me if so. Thanks!
Below are the details:
As you notice here, I expected 'OK' response in the 3rd step as well. Do I need re-generate the certificates in any specific way? the same certs works fine when the first server is up. Please point me in right direction.
Any help is highly appreciated. Thanks.
Originally posted by @pariminaresh in #10911 (comment)
The text was updated successfully, but these errors were encountered: