Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries for kubeadm join / MarkControlPlane #2093

Closed
fabriziopandini opened this issue Mar 30, 2020 · 5 comments
Closed

Add retries for kubeadm join / MarkControlPlane #2093

fabriziopandini opened this issue Mar 30, 2020 · 5 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@fabriziopandini
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version: v1.17.*

What happened?

While executing Cluster API tests, in some cases it was observed kubeadm join failures when adding the master label to the joining node.

xref kubernetes-sigs/cluster-api#2769

What you expected to happen?

To make mark control plane more resilient by adding a retry loop to this operation

How to reproduce it (as minimally and precisely as possible)?

This error happens only sometimes, most probably when there is a temporary blackout of the load balancer that sits in front of the API servers (HA proxy reloading his configuration).
Also, the error might happen when the new API server enters the load balancing pool but the underlying etcd member is not yet available due to slow network/slow I/O causing delays in etcd getting online or in some cases, also change fo the etcd leader.

Anything else we need to know?

Important: if possible the change should be kept as small and possible and backported

@neolit123 neolit123 added this to the v1.19 milestone Mar 30, 2020
@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 30, 2020
@RA489
Copy link
Contributor

RA489 commented Apr 2, 2020

/assign

@neolit123
Copy link
Member

@RA489 i'm going to take this ticket as i have some time later today and tomorrow.
/assign

@fabriziopandini
Copy link
Member Author

@neolit123 thanks for pointing this out.
I think that two minutes are ok unless we get some more reports about failures
/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

@neolit123 thanks for pointing this out.
I think that two minutes are ok unless we get some more reports about failures
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

4 participants