New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support Kubernetes cluster-autoscaler #1299

Merged

innobead merged 5 commits into longhorn:master from c3y1huang:ca-support

Apr 21, 2022

Contributor

c3y1huang commented Apr 13, 2022

longhorn/longhorn#2203

c3y1huang force-pushed the ca-support branch 2 times, most recently from b967b78 to c51e9ac Compare

April 14, 2022 11:54

c3y1huang marked this pull request as ready for review

April 14, 2022 12:21

innobead requested review from shuo-wu, derekbit and PhanLe1010

April 14, 2022 16:31

c3y1huang self-assigned this

c3y1huang mentioned this pull request

[FEATURE] instance-manager compatibility with Cluster Autoscaler longhorn/longhorn#2203

Closed

c3y1huang force-pushed the ca-support branch 2 times, most recently from 6e35949 to 96811e9 Compare

April 15, 2022 08:01

innobead reviewed

View reviewed changes

datastore/kubernetes.go Outdated

               	if err != nil {
               		return false, err
               	}
+              	clusterAutoscalerEnabled, err := s.GetSettingAsBool(types.SettingNameKubernetesClusterAutoscalerEnabled)

Member

innobead Apr 18, 2022 •

edited

Loading

I think we should just allow the drain w/ cordon case triggered by cluster autoscaler to avoid unexpected following replica scheduling to the scaling down node, so that said we just keep the current function implementation as is instead of respecting ToBeDeletedByClusterAutoscaler which also include the drain w/o cordon case. WDYT?

note: this should be noticed in the doc.

Member

innobead Apr 19, 2022

After discussing with @c3y1huang , I think there are no obvious ways to let users configure cordon before drain in cloud provider cluster autoscaler.

Also, the specific taint added by CA actually means the node is marked to scale down, so it indirectly means it's unscheduled for Longhorn. Let's keep the change as is first.

ref: kubernetes/ingress-gce#595

Contributor Author

c3y1huang Apr 19, 2022 •

edited

Loading

After talking to @innobead ,

There is an option flag cordon-node-before-terminating. If this option is enabled, CA will mark the node spec unschedulable. However, regardless of the option flag, CA will taint the node with ToBeDeletedByClusterAutoscaler.

There could be a possible corner case when the user manually tolerates all taints and the replica could still rebuild onto the to be deleted node.

We think this can be tackled later if users have further concerns. For now, we can add this behavior to the doc.

innobead reviewed

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

innobead reviewed

View reviewed changes

types/types.go Outdated Show resolved Hide resolved

innobead reviewed

View reviewed changes

controller/setting_controller.go Show resolved Hide resolved

c3y1huang force-pushed the ca-support branch from 96811e9 to 1de51c4 Compare

April 19, 2022 03:59

c3y1huang requested a review from innobead

April 19, 2022 04:18

innobead requested changes

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/setting_controller.go Outdated Show resolved Hide resolved

controller/setting_controller.go Show resolved Hide resolved

Member

innobead commented Apr 19, 2022 •

edited

Loading

Except for the comments above, LGTM

c3y1huang force-pushed the ca-support branch 2 times, most recently from c96c737 to 9ff8de3 Compare

April 19, 2022 11:05

c3y1huang requested a review from innobead

April 19, 2022 11:10

shuo-wu reviewed

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/setting_controller.go Show resolved Hide resolved

c3y1huang force-pushed the ca-support branch from 9ff8de3 to e64ceea Compare

April 20, 2022 00:56

shuo-wu reviewed

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

c3y1huang force-pushed the ca-support branch 3 times, most recently from 51e8a52 to 8b967a1 Compare

April 20, 2022 10:49

shuo-wu reviewed

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

c3y1huang force-pushed the ca-support branch 2 times, most recently from 31fc0b4 to a5898dc Compare

April 21, 2022 02:28

c3y1huang requested a review from shuo-wu

April 21, 2022 02:32

c3y1huang force-pushed the ca-support branch from 51245fb to 416e231 Compare

April 21, 2022 07:05

innobead reviewed

View reviewed changes

Member

innobead left a comment •

edited

Loading

In general LGTM, just a few questions left.

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/setting_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Show resolved Hide resolved

c3y1huang force-pushed the ca-support branch from 416e231 to 122cf52 Compare

April 21, 2022 09:39

c3y1huang requested a review from innobead

April 21, 2022 09:46

innobead previously approved these changes

View reviewed changes

Member

innobead left a comment

LGTM

shuo-wu previously approved these changes

View reviewed changes

Contributor

shuo-wu left a comment

LGTM

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

c3y1huang dismissed stale reviews from shuo-wu and innobead via

329c38a

April 21, 2022 12:05

c3y1huang force-pushed the ca-support branch from 122cf52 to 329c38a Compare

April 21, 2022 12:05

c3y1huang added 2 commits

April 21, 2022 20:35


          cluster-autoscaler: add global setting

c7d0aa7

Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>


          Fix typo

34c0bf4

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>

c3y1huang force-pushed the ca-support branch from 329c38a to ce5d41d Compare

April 21, 2022 12:35

c3y1huang requested review from innobead and shuo-wu

April 21, 2022 12:39

innobead reviewed

View reviewed changes

controller/instance_manager_controller.go Outdated Show resolved Hide resolved

c3y1huang added 3 commits

April 21, 2022 21:59


          cluster-autoscaler: implement PDB managment

3b4f090

Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>


          cluster-autoscaler: annotate safe-to-evict to im pods

1febcf2

Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>


          cluster-autoscaler: annotate safe-to-evict to deployments

9ca7758

Longhorn-2203

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>

c3y1huang force-pushed the ca-support branch from ce5d41d to 9ca7758 Compare

April 21, 2022 13:59

innobead approved these changes

View reviewed changes

innobead merged commit 4c2e9b2 into longhorn:master

Member

innobead commented Apr 21, 2022 •

edited

Loading

@c3y1huang well done & thanks @shuo-wu review 👍

c3y1huang deleted the ca-support branch

April 21, 2022 14:45

weizhe0422 mentioned this pull request

[BUG] A stopped replica on a removed node should not be counted as a healthy replica for the drain setting longhorn/longhorn#2237

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet