MachineSet: support for defining specific machines when scaling down #75

rsdcastro · 2018-04-17T00:25:32Z

This issue is to track ability to specify which machines to be removed from a machineset when it's scaling down. This is a requirement to support Cluster Autoscaling use cases.

Initial strawman using annotations to mark which machines to be removed would work, but there are concerns about not having an atomic operation for this kind of change.

cc @krousey @mwielgus @krzysztof-jastrzebski @MaciekPytel

roberthbailey · 2018-08-17T22:04:52Z

Also see #45 (comment)

hardikdr · 2018-08-29T06:08:51Z

I can see following annotation based approach as a low-hanging fruit.

We by-default prioritize all machines to priority:3 via annotation – Machine-deployment basically adds this annotation on all machines while creation.
While scaling-down the machine-deployment, Autoscaler will decrease the priority of those specific set of machines to 1 by updating annotation: priority:1.
- And then in separate call – scales-down the machine-deployment. That would avoid the risk of not-being atomic.
MachineDeployment/MachineSet reconciliation always prefers to delete the Machines with lowest-priority first and hence purpose would be served.

Though, for a long-term solution we might want to learn from Pod-Priority and PodClass: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass

Also fyi: kubernetes/enhancements#609

ingvagabund · 2018-09-02T21:46:45Z

We can have multiple delete strategies, parallel to the image pull policy. E.g.

---
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineSet
metadata:
  name: <NAME>
  namespace: <NAMESPACE>
  labels:
    ...
spec:
  replicas: 2
  deletePolicy: <POLICY>
  selector:
    ...
  template:
    ...

where the deletePolicy could be:

Random
Delete the oldest replica
Delete a replica with lower priority
..

It can default to the Random. The goal is to unblock the integration of the cluster API into the autoscaler in case we don't get agreement in reasonable time.

The code in question: https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machineset/controller.go#L325-L329

roberthbailey · 2018-09-04T04:52:55Z

The autoscaler would only work with the lowest priority policy though, right? Otherwise the autoscaler would try to scale down by a specific machine (by changing the priority) and when the machine set was resized a random or oldest machine would instead be deleted.

ingvagabund · 2018-09-25T12:39:11Z

Initial strawman using annotations to mark which machines to be removed would work, but there are concerns about not having an atomic operation for this kind of change.

Is there any solution that provides atomic operation?

Machineset controller should not change the number of replicas since it's up to machineset consumers to do it. The controller's responsibility is to either create new machines or delete existing (at the same time updating machineset's status). Based on that there is no mechanism to protect a machineset from very rapidly alternating changes of replicas field. In case a specific machine is annotated, the machineset controller still lists all potential machines to be deleted. It only changes the lottery to archery contest. Hit the bulls eye first. It's likely the following can happen:

Annotate machine(s)
Decrease number of replicas
Start deleting machines, meantime someone removes the annotation label (e.g. annotating different machines)
Machineset controller removes machines that were not supposed to be deleted.
Oops

So ok, we can have another routine that will listen for machines and everytime a machine gets annotated, remove it and decrease the replicas:

Machine is annotated
Remove the machine
Decrease the number of replicas. Oops, the machineset controller notices a machine is removed -> recreate a new one since the replicas has not changed
Oops, the routine decreased the number of replicas (assuming there was no error)
Machineset controller deletes a machine to compensate the replicas changed. Oh, different machine gets delete since the list of machines to be deleted is not sorted chronologically.

The replicas logic and deletion of a specific machine are tightly connected.

roberthbailey · 2018-10-01T10:37:58Z

@maisem has been thinking about this as well, so adding him in here.

ntfrnzn · 2018-12-19T06:13:40Z

As per discussion on slack, the atomicity requirement seems unclear, and maybe we can move forward with a deletePolicy setup that is just

(a) deletePolicy = simple|newest|oldest
(b) simple policy attends to a known “delete-me” annotation on the machine, if it is present
(c) policies newest|oldest preferentially remove machines based on their age (since some time, probably ObjectMeta.CreationTimestamp)

An external oracle (if it exists) can annotate the machines based on external knowledge (if it chooses to) and thus affect the simple delete policy.

ingvagabund · 2019-01-14T15:32:05Z

The current simple deletion policy [1] breaks machines into 3 classes:

must delete: deletion timestamp is non zero
better delete: machines with reported error
could delete: remaining machines

Our current implementation of cluster-api based cluster-autoscaler [2] targets the must delete class (which gets removed as first) with sigs.k8s.io/cluster-api-delete-machine machine annotation [3].
The process of scaling down then reduces to:

cluster autoscaler annotates all candidate machines for scaling down
cluster autoscaler reduces machine set number of replicas
machineset controller starts removing all machines with sigs.k8s.io/cluster-api-delete-machine annotation before any other

So the process covers the b) case.

[1] https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machineset/delete_policy.go#L34-L42
[2] https://github.com/openshift/kubernetes-autoscaler/tree/master/cluster-autoscaler/cloudprovider/clusterapi
[3] #513

@ntfrnzn would that be applicable for your use case?

MaciekPytel · 2019-01-14T16:22:59Z

I'm still not convinced this approach is safe to use with Cluster Autoscaler. Sure it will work most of the time, but without atomic way to delete a specific machine you have all sorts of races that may lead to additional random machine being removed. At that point all guarantees given by CA about respecting things like PodDisruptionBudgets and various no-scale-down annotations are gone.

For scale-down CA doesn't work on machine sets, it look at individual nodes and checks which ones can be deleted. Ultimately what it does is removing a specific machine, NOT changing the desired number of machines. If someone else removes the machine while it's being drained the delete operation must fail.
In extreme case CA can even race with itself by scaling-up the machineset while it's removing a VM from it.

All those races are the reason for requesting an atomic implementation in the initial issue description.

ntfrnzn · 2019-01-14T16:49:39Z

From my perspective, yes, this is what I pictured for deletePolicy: simple.

In the meantime, I wrote a few lines of code for the time-based policies that have a sigmoid-like function, deletePriority(creationTimeStamp, now) -> (0.0, 100) that maps a machine into a deletion-priority range (float64). I hesitated over where to define the string deletePolicy (in MachineSet, MachineDeployment types), and haven't gotten back to it.

More importantly, I'm conflicted over the atomicity requirements. It complicates the types. One implementation would be to add a property like []string machineIdsMustNotExist to the MachineSet, so that the clusterAutoscaler can update the MachineSet by changing (a) the size and (b) the to-be-deleted-list on a single object atomically, and when the resync happens, the controller will remove the named machines and make the set the correct size.

roberthbailey · 2019-01-23T18:46:14Z

/assign @maisem

k8s-ci-robot · 2019-01-23T18:46:16Z

@roberthbailey: GitHub didn't allow me to assign the following users: maisem.

Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @maisem

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hardikdr · 2019-02-19T06:31:27Z

Ping.
There have been multiple efforts and discussions on this topic. Let's please try to discuss this and try to settle on the approach in the next meeting.
Achieving the atomicity seems really hard, not sure if there is really a way to do so.
I guess, we could decide on the simple priority-based approach for now and unblock the autoscaler integration.
Ref:

@maisem @roberthbailey @ingvagabund @ntfrnzn @erstapples @ingvagabund @MaciekPytel

ncdc · 2019-03-05T15:55:25Z

This has priority/important-longterm. Do we think we'll be able to resolve any outstanding questions/issues in time for v1alpha1?

ncdc · 2019-03-06T19:42:59Z

We didn't discuss this during today's meeting. I propose that this isn't required for v1alpha1 and we can defer it to a future milestone. WDYT @hardikdr @justinsb @detiber @rsdcastro?

detiber · 2019-03-06T19:44:57Z

Considering there is an open PR that could resolve this: https://github.com/kubernetes-sigs/cluster-api/pull/726/files I don't see a need to bump it unless we cannot get that PR merged in time.

ncdc · 2019-03-06T19:46:04Z

There is also #513.

vincepri · 2019-03-20T17:52:29Z

/assign

Build container image on all pushes and publish image on push to master

…ntroller-gen Always use the right controller-gen

go get dep will install HEAD version which is not what we want.

rsdcastro added the cluster-api label Apr 17, 2018

rsdcastro added this to the cluster-api-beta-implementation milestone Apr 17, 2018

hardikdr mentioned this issue Sep 10, 2018

WIP: cluster autoscaler integration with machine API kubernetes/community#2653

Closed

ingvagabund mentioned this issue Sep 25, 2018

Delete specific machine from a machineset when scaling down #513

Closed

anfernee mentioned this issue Oct 24, 2018

Add simple machineset delete policy #558

Merged

timothysc removed the kind/cluster-api label Jan 10, 2019

timothysc mentioned this issue Jan 10, 2019

Define machines deletion policies. #45

Closed

timothysc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 10, 2019

timothysc modified the milestones: cluster-api-beta-implementation, v1alpha1 Jan 10, 2019

timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 10, 2019

erstaples mentioned this issue Feb 4, 2019

Adds Oldest and Newest delete policies and annotation-based Simple delete policy #726

Merged

k8s-ci-robot assigned vincepri Mar 20, 2019

k8s-ci-robot closed this as completed in #726 Mar 27, 2019

chuckha pushed a commit to chuckha/cluster-api that referenced this issue Oct 2, 2019

Merge pull request kubernetes-sigs#75 from chuckha/chuckha-patch-1

4db6777

Build container image on all pushes and publish image on push to master

chuckha pushed a commit to chuckha/cluster-api that referenced this issue Oct 2, 2019

Merge pull request kubernetes-sigs#75 from ncdc/always-use-correct-co…

c5d4d62

…ntroller-gen Always use the right controller-gen

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this issue Jan 31, 2020

Use release version dep tool (kubernetes-sigs#75)

8704ad1

go get dep will install HEAD version which is not what we want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MachineSet: support for defining specific machines when scaling down #75

MachineSet: support for defining specific machines when scaling down #75

rsdcastro commented Apr 17, 2018

roberthbailey commented Aug 17, 2018

hardikdr commented Aug 29, 2018 •

edited

Loading

ingvagabund commented Sep 2, 2018

roberthbailey commented Sep 4, 2018

ingvagabund commented Sep 25, 2018 •

edited

Loading

roberthbailey commented Oct 1, 2018

ntfrnzn commented Dec 19, 2018

ingvagabund commented Jan 14, 2019 •

edited

Loading

MaciekPytel commented Jan 14, 2019

ntfrnzn commented Jan 14, 2019

roberthbailey commented Jan 23, 2019

k8s-ci-robot commented Jan 23, 2019

hardikdr commented Feb 19, 2019 •

edited

Loading

ncdc commented Mar 5, 2019

ncdc commented Mar 6, 2019

detiber commented Mar 6, 2019

ncdc commented Mar 6, 2019

vincepri commented Mar 20, 2019

MachineSet: support for defining specific machines when scaling down #75

MachineSet: support for defining specific machines when scaling down #75

Comments

rsdcastro commented Apr 17, 2018

roberthbailey commented Aug 17, 2018

hardikdr commented Aug 29, 2018 • edited Loading

ingvagabund commented Sep 2, 2018

roberthbailey commented Sep 4, 2018

ingvagabund commented Sep 25, 2018 • edited Loading

roberthbailey commented Oct 1, 2018

ntfrnzn commented Dec 19, 2018

ingvagabund commented Jan 14, 2019 • edited Loading

MaciekPytel commented Jan 14, 2019

ntfrnzn commented Jan 14, 2019

roberthbailey commented Jan 23, 2019

k8s-ci-robot commented Jan 23, 2019

hardikdr commented Feb 19, 2019 • edited Loading

ncdc commented Mar 5, 2019

ncdc commented Mar 6, 2019

detiber commented Mar 6, 2019

ncdc commented Mar 6, 2019

vincepri commented Mar 20, 2019

hardikdr commented Aug 29, 2018 •

edited

Loading

ingvagabund commented Sep 25, 2018 •

edited

Loading

ingvagabund commented Jan 14, 2019 •

edited

Loading

hardikdr commented Feb 19, 2019 •

edited

Loading