-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MachineSet: support for defining specific machines when scaling down #75
Comments
Also see #45 (comment) |
I can see following annotation based approach as a low-hanging fruit.
Though, for a long-term solution we might want to learn from Pod-Priority and PodClass: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass Also fyi: kubernetes/enhancements#609 |
We can have multiple delete strategies, parallel to the image pull policy. E.g. ---
apiVersion: cluster.k8s.io/v1alpha1
kind: MachineSet
metadata:
name: <NAME>
namespace: <NAMESPACE>
labels:
...
spec:
replicas: 2
deletePolicy: <POLICY>
selector:
...
template:
... where the
It can default to the Random. The goal is to unblock the integration of the cluster API into the autoscaler in case we don't get agreement in reasonable time. The code in question: https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machineset/controller.go#L325-L329 |
The autoscaler would only work with the lowest priority policy though, right? Otherwise the autoscaler would try to scale down by a specific machine (by changing the priority) and when the machine set was resized a random or oldest machine would instead be deleted. |
Is there any solution that provides atomic operation? Machineset controller should not change the number of replicas since it's up to
So ok, we can have another routine that will listen for machines and everytime a machine gets annotated, remove it and decrease the
The |
@maisem has been thinking about this as well, so adding him in here. |
As per discussion on slack, the atomicity requirement seems unclear, and maybe we can move forward with a deletePolicy setup that is just (a) deletePolicy = simple|newest|oldest An external oracle (if it exists) can annotate the machines based on external knowledge (if it chooses to) and thus affect the simple delete policy. |
The current simple deletion policy [1] breaks machines into 3 classes:
Our current implementation of cluster-api based cluster-autoscaler [2] targets the
So the process covers the b) case. [1] https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machineset/delete_policy.go#L34-L42 @ntfrnzn would that be applicable for your use case? |
I'm still not convinced this approach is safe to use with Cluster Autoscaler. Sure it will work most of the time, but without atomic way to delete a specific machine you have all sorts of races that may lead to additional random machine being removed. At that point all guarantees given by CA about respecting things like PodDisruptionBudgets and various no-scale-down annotations are gone. For scale-down CA doesn't work on machine sets, it look at individual nodes and checks which ones can be deleted. Ultimately what it does is removing a specific machine, NOT changing the desired number of machines. If someone else removes the machine while it's being drained the delete operation must fail. All those races are the reason for requesting an atomic implementation in the initial issue description. |
From my perspective, yes, this is what I pictured for deletePolicy: simple. In the meantime, I wrote a few lines of code for the time-based policies that have a sigmoid-like function, deletePriority(creationTimeStamp, now) -> (0.0, 100) that maps a machine into a deletion-priority range (float64). I hesitated over where to define the string deletePolicy (in MachineSet, MachineDeployment types), and haven't gotten back to it. More importantly, I'm conflicted over the atomicity requirements. It complicates the types. One implementation would be to add a property like |
/assign @maisem |
@roberthbailey: GitHub didn't allow me to assign the following users: maisem. Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Ping.
@maisem @roberthbailey @ingvagabund @ntfrnzn @erstapples @ingvagabund @MaciekPytel |
This has |
We didn't discuss this during today's meeting. I propose that this isn't required for v1alpha1 and we can defer it to a future milestone. WDYT @hardikdr @justinsb @detiber @rsdcastro? |
Considering there is an open PR that could resolve this: https://github.com/kubernetes-sigs/cluster-api/pull/726/files I don't see a need to bump it unless we cannot get that PR merged in time. |
There is also #513. |
/assign |
Build container image on all pushes and publish image on push to master
…ntroller-gen Always use the right controller-gen
go get dep will install HEAD version which is not what we want.
This issue is to track ability to specify which machines to be removed from a machineset when it's scaling down. This is a requirement to support Cluster Autoscaling use cases.
Initial strawman using annotations to mark which machines to be removed would work, but there are concerns about not having an atomic operation for this kind of change.
cc @krousey @mwielgus @krzysztof-jastrzebski @MaciekPytel
The text was updated successfully, but these errors were encountered: