-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPSTREAM: <carry>: cluster-api: use node.Spec.ProviderID for identifying nodegroups #14
UPSTREAM: <carry>: cluster-api: use node.Spec.ProviderID for identifying nodegroups #14
Conversation
9e16280
to
3e44d91
Compare
3e44d91
to
ec43a97
Compare
ec43a97
to
8b89e4a
Compare
objs, err := machineIndexer.ByIndex(nodeNameIndexKey, node.Name) | ||
func (c *machineController) findMachine(id string) (*v1alpha1.Machine, error) { | ||
store := c.machineInformer.Informer().GetStore() | ||
item, exists, err := store.GetByKey(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the store
is used only here, it's more transparent to do c.machineInformer.Informer().GetStore().GetByKey(id)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return machineSet.DeepCopy(), nil | ||
} | ||
|
||
// run starts shared informers and waits for the shared informer cache | ||
// to synchronize. | ||
// Run starts shared informers and waits for the informer cache to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Run/run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// machineSet. For each machine in the set a DeepCopy() of the object | ||
// is returned. | ||
func (c *machineController) MachinesInMachineSet(machineSet *v1alpha1.MachineSet) ([]*v1alpha1.Machine, error) { | ||
machines, err := c.machineInformer.Lister().Machines(machineSet.Namespace).List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine with not so many machines. Though, with hundreds of machines in the same namespace (which would be insine but still) this can be slow. What's the benefit of listing all instead of listing all machines that has labels matched by the machineset's label selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if node != nil { | ||
nodeNames = append(nodeNames, node.Spec.ProviderID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you append the ProviderID into the list of node names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name is badly named now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
@ingvagabund addressed your issues. @tnozicka had some feedback that I should also be checking UUID values which may be better to do in a follow-up PR as it's important that we switch to indexing via node.spec.ProviderID ASAP. Thoughts? @derekwaynecarr |
if err != nil { | ||
return nil, err | ||
} | ||
if len(objs) != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return an error if the length is greater than one? However unlikely, it would be good to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
return nil | ||
machineName, found := node.Annotations["machine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should probably try the newer cluster.k8s.io/machine
annotation first, then fallback to machine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the link.
This is now done too. |
/retest |
This switches the lookup of `node` objects to using the node.Spec.ProviderID. The previous index was node.Name but this is wrong as it leads to nodes going unregistered.
4e26565
to
8247728
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question, generally looks good.
/lgtm
/approve
|
||
machineName, found := node.Annotations["cluster.k8s.io/machine"] | ||
if !found { | ||
machineName, found = node.Annotations["machine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: we seem to fall back on this always currently, is there a separate pr that updates our annotation usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our nodelink-controller is only adding "machine" at the moment, so no other PR right now but will add a Jira ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our nodelink-controller is only adding "machine" at the moment, so no other PR right now but will add a Jira ticket.
This change switches from using
node.Name
to node.Spec.ProviderID when building acloudprovider.NodeGroup
. Usingnode.Name
has been a historical mistake.There have been several suggestions for adding the provider ID to the machine object at the cluster-api level (kubernetes-sigs/cluster-api#565) and by an actuator implementation (openshift/cluster-api-provider-aws#86). For the moment we can make this mapping without either of those PRs assuming there's the "machine" annotation on the node object. This annotation is currently added by the nodelink-controller (https://github.com/openshift/machine-api-operator).
Switching
cloudprovider.NodeGroupForNode()
to indexing onnode.Spec.ProviderID
and also returning provider ID values incloudprovider.Nodes()
means we no longer experience the case where the nodegroup/node becomes unregistered, which ultimately leads to the autoscaler to stop both scale up and down operations as it deems the state of the cluster to be unhealthy.We may want to revisit how this is done in the longer term because it does assume that the nodelink-controller is running on the cluster. But this change requires no additional changes from the cluster-api (so no revendoring) nor does it require a change to the actuators that we use (AWS, libvirt).