-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Kubernetes cluster-autoscaler #1299
Conversation
b967b78
to
c51e9ac
Compare
6e35949
to
96811e9
Compare
datastore/kubernetes.go
Outdated
@@ -430,9 +430,35 @@ func (s *DataStore) IsKubeNodeUnschedulable(nodeName string) (bool, error) { | |||
if err != nil { | |||
return false, err | |||
} | |||
|
|||
clusterAutoscalerEnabled, err := s.GetSettingAsBool(types.SettingNameKubernetesClusterAutoscalerEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just allow the drain w/ cordon case triggered by cluster autoscaler to avoid unexpected following replica scheduling to the scaling down node, so that said we just keep the current function implementation as is instead of respecting ToBeDeletedByClusterAutoscaler
which also include the drain w/o cordon case. WDYT?
note: this should be noticed in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @c3y1huang , I think there are no obvious ways to let users configure cordon before drain
in cloud provider cluster autoscaler.
Also, the specific taint added by CA actually means the node is marked to scale down, so it indirectly means it's unscheduled for Longhorn. Let's keep the change as is first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking to @innobead ,
There is an option flag cordon-node-before-terminating. If this option is enabled, CA will mark the node spec unschedulable. However, regardless of the option flag, CA will taint the node with ToBeDeletedByClusterAutoscaler
.
There could be a possible corner case when the user manually tolerates all taints and the replica could still rebuild onto the to be deleted node.
We think this can be tackled later if users have further concerns. For now, we can add this behavior to the doc.
Except for the comments above, LGTM |
c96c737
to
9ff8de3
Compare
51e8a52
to
8b967a1
Compare
31fc0b4
to
a5898dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM, just a few questions left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Longhorn-2203 Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203 Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203 Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
Longhorn-2203 Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
@c3y1huang well done & thanks @shuo-wu review 👍 |
longhorn/longhorn#2203