-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc for multiple worker node groups support #757
Conversation
8ba7400
to
4b5c5b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good
|
||
For each group, we will append these three fields corresponding to that group in the capi spec. | ||
|
||
Right now, the cli assumes that there will be only one group and it treats worker node group configuration array as a collection of only one element. As a result, the controller just refers to the first element of this array in different places of the code. So we need to do the same operations in loops, which includes capi spec creation, cluster spec validation etc. Once a capi spec is created with this approach, the workload cluster will be created with multiple worker nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be part of this doc at all, but it might be a good idea to document all the places where this assumption is being made and how deeply they go (at least before starting the execution). Maybe something we can do in parallel to this review.
I fear this refactor might be a bit more complex/lengthy that what it seems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss this separately.
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 | ||
kind: VSphereMachineTemplate | ||
metadata: | ||
name: eksa-test-worker-node-template-1638469395669 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be mapping the VSphereMachineTemplate
to the worker node group, which means we would have a eksa-test-1-worker-node-template-*
and eksa-test-2-worker-node-template-*
? I think it gets a little complicated when we think about maintaining a mapping from worker node groups and machine config objects to the capi template, especially when introducing/removing a machine config. Unless we just say that every worker node group configuration warrants a new capi template spec, regardless of whether it references the same machine config or not.
Based on what it is, would like to see a note about that as a sentence or two in the design doc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this as well
In particular: especially when introducing/removing a machine config
, super interesting problem I didn't think about
|
||
For each group, we will append these three fields corresponding to that group in the CAPI spec. | ||
|
||
Right now, the cli assumes that there will be only one group and it treats worker node group configuration array as a collection of only one element. As a result, the controller just refers to the first element of this array in different places of the code. So we need to do the same operations in loops, which includes CAPI spec creation, cluster spec validation etc. Once a CAPI spec is created with this approach, the workload cluster will be created with multiple worker nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we introduce multiple machine configs, are we going to use Go array templating to loop or maintain a default capi spec containing worker node configuration, and append to the resultant capi spec depending on how many worker node groups that we have? I would prefer to do the latter so that we control generating new capi worker node specs based on the number of machine config objects we have configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 option: use capi api (go) structs
This one has my vote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use go structs.
|
||
Also, it needs to be made sure that at the least one of the worker node groups does not have `NoExecute` or `NoSchedule` taint. This validation will be done at the preflight validation stage. | ||
|
||
The examples in this design are for vsphere provider. But the same strategy applies for other providers as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that for docker, only changing the taints for each worker node group would warrant a new capi template spec, or just each worker node group corresponds to a separate capi template spec even if the values are exactly the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for docker also, we will be adding KubeadmConfigTemplate, MachineDeployment, VSphereMachineTemplate in the CAPI spec file for each worker node group.
|
||
Right now, the cli assumes that there will be only one group and it treats worker node group configuration array as a collection of only one element. As a result, the controller just refers to the first element of this array in different places of the code. So we need to do the same operations in loops, which includes CAPI spec creation, cluster spec validation etc. Once a CAPI spec is created with this approach, the workload cluster will be created with multiple worker nodes. | ||
Right now, the cli assumes that there will be only one group and it treats worker node group configuration array as a collection of only one element. As a result, the controller just refers to the first element of this array in different places of the code. So we need to do the same operations in loops, which includes CAPI spec creation, cluster spec validation etc. Once a CAPI spec is created with this approach, the workload cluster will be created with multiple worker nodes. We will use an array of CAPI objects to store the worker node group configurations and then generate CAPI spec file using that array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention that one element of the array of worker node group configurations corresponds to a set of CAPI objects consisting of KubeadmConfig, MachineDeployment, and whatever else is there? Just gives us an understanding as to what to expect, even if there are repeated VSphereMachineConfigs for each of worker node groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, if there is a data structure encompassing all three. But these three types are well defined in capi and capv code bases. What I plan to do is to create a structure of these 3 elements and then create an array of that structure. I will update the design doc.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bnrjee, g-gaston, vivek-koppuru The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.