-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress controller should react to node scale down event from autoscaler #595
Comments
Hi, I would like to grab this issue, if no one is working on it already. |
@freehan I might be looking at it wrong, but I think the problem is not on ingress-gce itself. Over here, on However, over here the node is only tainted as NoSchedule. IMO the autoscaler should also mark the node as Unschedulable. I'm writing a few tests on the autoscaler for that. |
Thanks for the PR! |
Yes. As you pointed out, the problem is the handshake between autoscaler and ingress-gce (or any other loadbalancer controller). This required a better design for synchronization between cluster node life cycle and load balancer controller. But for now, as a stop gap, we just need to watch autoscaler's taint and react. |
Would really like this fix in GKE - can anyone comment on how long it will take before it's available? |
We will cherry pick this into 1.6 branch. And it will follow the GKE release pipeline and possibly available in newer version of 1.13.8+ |
Cluster autoscaler adds the ToBeDeletedByClusterAutoscaler taint on the candidate node. Then it goes on to evict pods from the nodes. After everything settles, it will delete the node.
Ingress controller should observe the taint and react by removing instance from instance group so that connection draining is triggered. This help avoid new connection to be arrive at the removing nodes which causes 502s.
The text was updated successfully, but these errors were encountered: