Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kubernetes cluster-autoscaler #1299

Merged
merged 5 commits into from
Apr 21, 2022
Merged

Conversation

c3y1huang
Copy link
Contributor

@c3y1huang c3y1huang force-pushed the ca-support branch 2 times, most recently from b967b78 to c51e9ac Compare April 14, 2022 11:54
@c3y1huang c3y1huang marked this pull request as ready for review April 14, 2022 12:21
@c3y1huang c3y1huang self-assigned this Apr 15, 2022
@c3y1huang c3y1huang force-pushed the ca-support branch 2 times, most recently from 6e35949 to 96811e9 Compare April 15, 2022 08:01
@@ -430,9 +430,35 @@ func (s *DataStore) IsKubeNodeUnschedulable(nodeName string) (bool, error) {
if err != nil {
return false, err
}

clusterAutoscalerEnabled, err := s.GetSettingAsBool(types.SettingNameKubernetesClusterAutoscalerEnabled)
Copy link
Member

@innobead innobead Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just allow the drain w/ cordon case triggered by cluster autoscaler to avoid unexpected following replica scheduling to the scaling down node, so that said we just keep the current function implementation as is instead of respecting ToBeDeletedByClusterAutoscaler which also include the drain w/o cordon case. WDYT?

note: this should be noticed in the doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @c3y1huang , I think there are no obvious ways to let users configure cordon before drain in cloud provider cluster autoscaler.

Also, the specific taint added by CA actually means the node is marked to scale down, so it indirectly means it's unscheduled for Longhorn. Let's keep the change as is first.

ref: kubernetes/ingress-gce#595

Copy link
Contributor Author

@c3y1huang c3y1huang Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to @innobead ,

There is an option flag cordon-node-before-terminating. If this option is enabled, CA will mark the node spec unschedulable. However, regardless of the option flag, CA will taint the node with ToBeDeletedByClusterAutoscaler.

There could be a possible corner case when the user manually tolerates all taints and the replica could still rebuild onto the to be deleted node.

We think this can be tackled later if users have further concerns. For now, we can add this behavior to the doc.

types/types.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/setting_controller.go Outdated Show resolved Hide resolved
controller/setting_controller.go Show resolved Hide resolved
@innobead
Copy link
Member

innobead commented Apr 19, 2022

Except for the comments above, LGTM

@c3y1huang c3y1huang force-pushed the ca-support branch 2 times, most recently from c96c737 to 9ff8de3 Compare April 19, 2022 11:05
@c3y1huang c3y1huang requested a review from innobead April 19, 2022 11:10
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/setting_controller.go Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the ca-support branch 3 times, most recently from 51e8a52 to 8b967a1 Compare April 20, 2022 10:49
@c3y1huang c3y1huang force-pushed the ca-support branch 2 times, most recently from 31fc0b4 to a5898dc Compare April 21, 2022 02:28
Copy link
Member

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, just a few questions left.

controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/setting_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Show resolved Hide resolved
innobead
innobead previously approved these changes Apr 21, 2022
Copy link
Member

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

shuo-wu
shuo-wu previously approved these changes Apr 21, 2022
Copy link
Contributor

@shuo-wu shuo-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

controller/instance_manager_controller.go Outdated Show resolved Hide resolved
controller/instance_manager_controller.go Outdated Show resolved Hide resolved
@c3y1huang c3y1huang dismissed stale reviews from shuo-wu and innobead via 329c38a April 21, 2022 12:05
Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
@innobead innobead merged commit 4c2e9b2 into longhorn:master Apr 21, 2022
@innobead
Copy link
Member

innobead commented Apr 21, 2022

@c3y1huang well done & thanks @shuo-wu review 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants