Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScaleOutFactor * number of jobs cannot be greater than MaxSize or no agents will be scaled out #34

Closed
nitrocode opened this issue Oct 6, 2020 · 3 comments · Fixed by #35 or #37

Comments

@nitrocode
Copy link
Contributor

nitrocode commented Oct 6, 2020

I was wondering why I wasn't seeing any agents. I went to the lambda's cloudwatch logs and saw this.

2020-10-06T17:47:13.030-04:00 | 2020/10/06 21:47:13 👮‍️ Increasing scale-out of 8 by factor of 2.00
2020-10-06T17:47:13.030-04:00 | 2020/10/06 21:47:13 Scaling OUT 📈 to 16 instances (currently 0)
2020-10-06T17:47:13.142-04:00 | 2020/10/06 21:47:13 Scaling error: ValidationError: New SetDesiredCapacity value 16 is above max value 10 for the AutoScalingGroup.

This looks like an error in the lambda scaler. If this scaler hits a max, instead of failing to scale out any agents, it should scale to the MaxSize set on the asg.

log.Printf("Scaling OUT 📈 to %d instances (currently %d)", desired, current.DesiredCount)
if err := s.setDesiredCapacity(desired); err != nil {

@yob
Copy link
Contributor

yob commented Oct 6, 2020

Thanks for flagging this.

As of ~24 hours there's a bug in master that will leave an instance running but with no agent if the terminate-instance exits non-zero. We hope to have this fixed in the next day or two.

@yob yob transferred this issue from buildkite/elastic-ci-stack-for-aws Oct 7, 2020
@yob
Copy link
Contributor

yob commented Oct 7, 2020

I've transferred this over to the scaler repo, where the fix would be required.

We think the solution might be to clamp the scale up logic to min(MaxSize, newDesiredSize)

@nitrocode
Copy link
Contributor Author

nitrocode commented Oct 7, 2020

That sounds like it would work.

One other issue i noticed is that the lambda output returned a 0 return code even though it returned an error. I've created a separate ticket #36 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants