Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck while scaling AWS spot instances in a "full" zone #2391

Closed
xrl opened this issue Sep 26, 2019 · 3 comments
Closed

Stuck while scaling AWS spot instances in a "full" zone #2391

xrl opened this issue Sep 26, 2019 · 3 comments

Comments

@xrl
Copy link

xrl commented Sep 26, 2019

We spent 45 minutes today waiting for spot instances to come online. The AZ was fully committed and would not give out spot reservations. I manually changed the scaling order to a different ASG (i.e., different AZ) and we got our servers right away.

Here is what it looked like when the ASG was trying to scale in a fully committed AZ:

image

I do not have any volume/taint restrictions for these workloads. It's safe to schedule them on any host. Could we make it so ASGs are knocked to the back of the list after a few failed attempts at scaling up?

@seh
Copy link

seh commented Oct 2, 2019

Does your version of the cluster autoscaler have #2235 included?

@Jeffwan
Copy link
Contributor

Jeffwan commented Oct 8, 2019

@xrl Please provides CA version and it would be helpful for us to debug the issue

@xrl
Copy link
Author

xrl commented Oct 10, 2019

I just built master and deployed it. I'm running an EKS 1.13 but running master... hopefully that isn't too problematic!

I'm going to close this ticket while I wait for another failure situation in our eu-central-1 region.

@xrl xrl closed this as completed Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants