Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

Closed
coultn opened this issue Jun 28, 2019 · 10 comments
Labels
ECS Amazon Elastic Container Service

Comments

@coultn
Copy link

coultn commented Jun 28, 2019

Customers would like the ability for tasks and services in an ECS cluster to run on a mix of on-demand and Spot instances, and have ECS automatically scale the number of on-demand and Spot instances according to the needs of the tasks and services.

For example, a replica service may request that the first 3 tasks run on On-Demand (OD) instances, and that any additional tasks split 50% between OD and Spot instances. ECS will ensure that (a) the tasks are scheduled on the appropriate instance type, and (b) the cluster scales so that the required number of OD and Spot instances are available. Different services running in the same cluster can use different parameters for the desired mix of OD and Spot. The general framework will support the first N tasks running on OD (N>=0) and %P of the additional tasks running on OD instances, with the remainder %(100-P) running on Spot instances.

For scaling purposes, the ECS cluster will allow two different EC2 Auto Scaling Groups (ASG) to be used in the same cluster; one ASG will be used for OD instances and the other for Spot instances. ECS will scale each ASG as needed to meet the needs of all services and tasks running in the cluster (see #76).

Interested in this idea? Please let us know if you have questions or comments!

@coultn coultn added the ECS Amazon Elastic Container Service label Jun 28, 2019
@talawahtech
Copy link

Sounds great, this was on my roadmap, so even better if you guys do the heavy lifting for me :)

The first N + percentage approach is exactly what I was thinking of as well.

@cat-turner
Copy link

👏🏼

@shandrew
Copy link

This sounds potentially useful for my use case, but I would want spot termination replacements to be launched during the two-minute grace period from when the termination notice is sent.

Our use case is:

  • Run ECS on a cluster of diverse instances, 100% spot, cores distributed roughly evenly across AZs
  • On spot termination notice, launch a spot replacement(s) (using our own tooling currently) in the AZ, drain the instance being terminated
  • if no spot available in the AZ, or spot request taking too long, launch an OD replacement in the AZ

For a service that can handle two minute draining, this should provide near 100% availability across zones while maximizing spot usage.

@cc4i
Copy link

cc4i commented Jun 29, 2019

I’ve done auto scaling group with mixed on-demand & spot nodes for ECS cluster, if you can enable us to place tasks on specific on-demand or spot node would be highly appreciated!

@ACenterA
Copy link

Can't this be already acheived ? We already do it in our serverless app ECS solution using multiple ASG and using our user-data scripts, we know if it is an spot-instance or not and add an appropriate ecs instance tags.

We can then use task placements / constraint to spread using the (custom instance-type attribute) and instance types / AZ to ensure availability....

@coultn
Copy link
Author

coultn commented Jun 29, 2019

Can't this be already acheived ? We already do it in our serverless app ECS solution using multiple ASG and using our user-data scripts, we know if it is an spot-instance or not and add an appropriate ecs instance tags.

We can then use task placements / constraint to spread using the (custom instance-type attribute) and instance types / AZ to ensure availability....

Not quite! You are right that you can already use Spot and OD instances in the same cluster. What you can't do today is have different services in the same cluster use different mixes of Spot and OD, and have the underlying ASGs scale automatically to the right size. For example, with this new feature you will be able to do things like: service A requires 50% Spot and 50 % OD, and service B use 100% Spot, and service C use the first 3 on OD and 25% OD/75% Spot beyond the first 3, all in the same cluster. The scheduler will ensure that the right mix of the each service lands on the right type of instance, AND that the right number of Spot and OD instances are available as the services scale. You can't accomplish this using the existing functionality of custom instance attributes and task placements, because the service scheduler can't maintain a desired split for a service in this way, it can only spread across the instances you already have running.

@sandeepboyapati
Copy link

Desperately waiting for feature which allows service to have 50% Spot and 50 % OD tasks

@tomaszdudek7
Copy link

Would be great to have.

@surajrathoresp
Copy link

this feature will be going to a game changer for ECS.

@coultn
Copy link
Author

coultn commented Dec 3, 2019

This has launched, via new ECS feature called Capacity Providers: https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ecs-capacity-providers-now-available/

@coultn coultn closed this as completed Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECS Amazon Elastic Container Service
Projects
None yet
Development

No branches or pull requests

9 participants