Prevent the bootstrapper to run kubeadm init on two nodes #99

fabriziopandini · 2019-08-08T13:42:32Z

/kind feature

Describe the solution you'd like
As discussed on #92

while the controlplane is not ready, the controller should requeue the second bootstrap control plane and try to join it after the controlplane gets ready.

That means that if by chance the second bootstrap control plane has a JoinConfiguration, then it will join (InitConfiguration and Clusterconfiguration will be ignored at this point); If the second bootstrap control plane has not JoinConfiguration the controller will error out and eventually update ErrorMessage/ErrorReason.

Anything else you would like to add:
The init-locker is already in the codebase, see #92; it is missing the plumbing into the controller

ykakarap · 2019-08-09T18:46:03Z

I would like to work on this issue.

ykakarap · 2019-08-09T18:50:26Z

To use the init-locker here I will need to create it using the newControlPlaneInitLocker(log logr.Logger, configMapClient corev1.ConfigMapsGetter).
Any suggestions on where I can get an instance of corev1.ConfigMapsGetter inside the Reconcile function of kubeadmconfig_controller.go?

Thank you.

chuckha · 2019-08-09T18:57:24Z

@ykakarap That would come from the static client. I think what you'd want to do in this case, as we aren't using the static clients, is to change that signature to the client found on the reconciler. I'll meet up with you and talk through it.

ykakarap · 2019-08-09T19:04:16Z

@chuckha Thanks for the walkthrough. 😄
Working on it now.

ykakarap · 2019-08-09T21:02:41Z

@fabriziopandini @chuckha I am acquiring the lock right before we start creating the bootstrap data for the init control plane. Ideally, we should release the lock after the cluster.x-k8s.io/control-plane-ready annotation is set.
Since we won't know when that happens from inside this Reconcile function, I am planning to have the lock release call every time we verify that annotation is set. Basically, perform the lock release operation here.

Thoughts?

ykakarap · 2019-08-09T21:52:13Z

/assign @ykakarap
/lifecycle active

fabriziopandini · 2019-08-12T07:55:09Z

@ykakarap

I am planning to have the lock release call every time we verify that annotation is set.

I think this is acceptable (there should be also a comment on the code on the same line few lines below), but I leave @chuckha the final word on this

chuckha · 2019-08-12T13:58:50Z

We should not be using an annotation here. We should be using the cluster.Status.InfrastructureReady field.

chuckha · 2019-08-12T14:01:14Z

Actually cluster.Status.InfrastructureReady doesn't seem to be right.

~~@vincepri @detiber I'd almost expect a field on the capi Cluster.Status telling us if we're ready for joining nodes to a cluster, wdyt?~~ nvm

fabriziopandini · 2019-08-12T14:07:36Z

AFAIK

cluster.Status.InfrastructureReady tell us when we can start joining nodes
cluster.x-k8s.io/control-plane-ready annotation tells us when the first control plane is up and running, and so we can start joining nodes

chuckha · 2019-08-12T14:13:08Z

oh i see, that is defined in CAPI. ok, missed that. The TODO should be removed and we should import

v1alpha2.ClusterAnnotationControlPlaneReady. Thanks!

Rest of this issue sounds good to me.

detiber · 2019-08-12T14:47:04Z

I'm definitely not a fan of relying on annotations rather than fields, maybe we need to add a Cluster.Status.ControlPlaneReady field even if it is just set by the Cluster controller when the annotation is set?

fabriziopandini · 2019-08-12T15:08:32Z

I'm +1 on adding Cluster.Status.ControlPlaneReady field

chuckha · 2019-08-12T15:10:35Z

I opened the linked issue above.

ykakarap · 2019-08-12T15:12:15Z

I am wondering how the following situation needs to be handled:

Let's say there are 2 control planes that are both trying to init. One of them will get the lock and will try to create the machine and do the init operation. Let's say while creating the machine it encountered an error because of some bad value in the yaml. This machine will try to reconcile over and over again but because of the bad yaml value it will never create. Because of this, the annotation will never be set.

Since the first control plane was not created, should we allow the second control plane to try and init?
Since the annotation is not set the second control plane will try to init. But, since the init lock is not released it will simply requeue.

Both the control planes (and any subsequent nodes) will be stuck in the reconcile loop with no progress.

Now the question is: should we consider this behavior "working as expected" since the mistake was in a value in the yaml or should we have a mechanism that allows the second control plane to try to init if the first one fails.

@chuckha @fabriziopandini

fabriziopandini · 2019-08-12T19:50:01Z

@ykakarap the scenario you are describing is correct, but unless @chuckha want to block on this case, I think that it is fine to postpone it to a follow-up PR.
For this iteration, let's make sure to release the lock if something fails while preparing the cloud-init for the bootstrap node.

amy · 2019-08-12T21:45:16Z

/milestone 0.1
/priority important-soon

k8s-ci-robot · 2019-08-12T21:45:17Z

@amy: You must be a member of the kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Bootstrap Provider Kubeadm Maintainers and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone 0.1
/priority important-soon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chuckha · 2019-08-13T14:04:31Z

No interest in blocking @fabriziopandini. Follow on is fine.

/milestone v0.1

ykakarap · 2019-08-13T14:55:26Z

No interest in blocking @fabriziopandini. Follow on is fine.

Ok then 👍 . @chuckha @fabriziopandini I will raise a PR for this issue later today or tomorrow morning.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 8, 2019

k8s-ci-robot assigned ykakarap Aug 9, 2019

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Aug 9, 2019

chuckha mentioned this issue Aug 12, 2019

Add a field on Cluster Status indicating the control plane is ready kubernetes-sigs/cluster-api#1243

Closed

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 12, 2019

k8s-ci-robot added this to the v0.1 milestone Aug 13, 2019

ykakarap mentioned this issue Aug 13, 2019

Only one node should do kubeadm init #122

Merged

k8s-ci-robot closed this as completed in #122 Aug 16, 2019

ykakarap mentioned this issue Aug 6, 2020

REQUEST: New membership for @ykakarap kubernetes/org#2084

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent the bootstrapper to run kubeadm init on two nodes #99

Prevent the bootstrapper to run kubeadm init on two nodes #99

fabriziopandini commented Aug 8, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

chuckha commented Aug 9, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

chuckha commented Aug 12, 2019 •

edited

Loading

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

detiber commented Aug 12, 2019

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

ykakarap commented Aug 12, 2019 •

edited

Loading

fabriziopandini commented Aug 12, 2019

amy commented Aug 12, 2019

k8s-ci-robot commented Aug 12, 2019

chuckha commented Aug 13, 2019

ykakarap commented Aug 13, 2019

Prevent the bootstrapper to run kubeadm init on two nodes #99

Prevent the bootstrapper to run kubeadm init on two nodes #99

Comments

fabriziopandini commented Aug 8, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

chuckha commented Aug 9, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

ykakarap commented Aug 9, 2019

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

chuckha commented Aug 12, 2019 • edited Loading

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

detiber commented Aug 12, 2019

fabriziopandini commented Aug 12, 2019

chuckha commented Aug 12, 2019

ykakarap commented Aug 12, 2019 • edited Loading

fabriziopandini commented Aug 12, 2019

amy commented Aug 12, 2019

k8s-ci-robot commented Aug 12, 2019

chuckha commented Aug 13, 2019

ykakarap commented Aug 13, 2019

chuckha commented Aug 12, 2019 •

edited

Loading

ykakarap commented Aug 12, 2019 •

edited

Loading