[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

coultn · 2019-06-28T19:34:23Z

Customers would like the ability for tasks and services in an ECS cluster to run on a mix of on-demand and Spot instances, and have ECS automatically scale the number of on-demand and Spot instances according to the needs of the tasks and services.

For example, a replica service may request that the first 3 tasks run on On-Demand (OD) instances, and that any additional tasks split 50% between OD and Spot instances. ECS will ensure that (a) the tasks are scheduled on the appropriate instance type, and (b) the cluster scales so that the required number of OD and Spot instances are available. Different services running in the same cluster can use different parameters for the desired mix of OD and Spot. The general framework will support the first N tasks running on OD (N>=0) and %P of the additional tasks running on OD instances, with the remainder %(100-P) running on Spot instances.

For scaling purposes, the ECS cluster will allow two different EC2 Auto Scaling Groups (ASG) to be used in the same cluster; one ASG will be used for OD instances and the other for Spot instances. ECS will scale each ASG as needed to meet the needs of all services and tasks running in the cluster (see #76).

Interested in this idea? Please let us know if you have questions or comments!

talawahtech · 2019-06-28T20:31:52Z

Sounds great, this was on my roadmap, so even better if you guys do the heavy lifting for me :)

The first N + percentage approach is exactly what I was thinking of as well.

cat-turner · 2019-06-29T00:23:23Z

👏🏼

shandrew · 2019-06-29T01:05:14Z

This sounds potentially useful for my use case, but I would want spot termination replacements to be launched during the two-minute grace period from when the termination notice is sent.

Our use case is:

Run ECS on a cluster of diverse instances, 100% spot, cores distributed roughly evenly across AZs
On spot termination notice, launch a spot replacement(s) (using our own tooling currently) in the AZ, drain the instance being terminated
if no spot available in the AZ, or spot request taking too long, launch an OD replacement in the AZ

For a service that can handle two minute draining, this should provide near 100% availability across zones while maximizing spot usage.

cc4i · 2019-06-29T12:43:10Z

I’ve done auto scaling group with mixed on-demand & spot nodes for ECS cluster, if you can enable us to place tasks on specific on-demand or spot node would be highly appreciated!

ACenterA · 2019-06-29T13:22:49Z

Can't this be already acheived ? We already do it in our serverless app ECS solution using multiple ASG and using our user-data scripts, we know if it is an spot-instance or not and add an appropriate ecs instance tags.

We can then use task placements / constraint to spread using the (custom instance-type attribute) and instance types / AZ to ensure availability....

coultn · 2019-06-29T18:03:36Z

Can't this be already acheived ? We already do it in our serverless app ECS solution using multiple ASG and using our user-data scripts, we know if it is an spot-instance or not and add an appropriate ecs instance tags.

We can then use task placements / constraint to spread using the (custom instance-type attribute) and instance types / AZ to ensure availability....

Not quite! You are right that you can already use Spot and OD instances in the same cluster. What you can't do today is have different services in the same cluster use different mixes of Spot and OD, and have the underlying ASGs scale automatically to the right size. For example, with this new feature you will be able to do things like: service A requires 50% Spot and 50 % OD, and service B use 100% Spot, and service C use the first 3 on OD and 25% OD/75% Spot beyond the first 3, all in the same cluster. The scheduler will ensure that the right mix of the each service lands on the right type of instance, AND that the right number of Spot and OD instances are available as the services scale. You can't accomplish this using the existing functionality of custom instance attributes and task placements, because the service scheduler can't maintain a desired split for a service in this way, it can only spread across the instances you already have running.

sandeepboyapati · 2019-07-15T12:44:33Z

Desperately waiting for feature which allows service to have 50% Spot and 50 % OD tasks

tomaszdudek7 · 2019-09-03T12:27:02Z

Would be great to have.

surajrathoresp · 2019-10-12T06:55:51Z

this feature will be going to a game changer for ECS.

coultn · 2019-12-03T23:49:29Z

This has launched, via new ECS feature called Capacity Providers: https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ecs-capacity-providers-now-available/

coultn added the ECS Amazon Elastic Container Service label Jun 28, 2019

coultn closed this as completed Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

coultn commented Jun 28, 2019

talawahtech commented Jun 28, 2019

cat-turner commented Jun 29, 2019

shandrew commented Jun 29, 2019

cc4i commented Jun 29, 2019 •

edited

Loading

ACenterA commented Jun 29, 2019

coultn commented Jun 29, 2019 •

edited

Loading

sandeepboyapati commented Jul 15, 2019

tomaszdudek7 commented Sep 3, 2019

surajrathoresp commented Oct 12, 2019

coultn commented Dec 3, 2019

[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

[ECS] Mixed On-Demand/Spot tasks and services in an ECS cluster, with automatic scaling of On-Demand and Spot instances #391

Comments

coultn commented Jun 28, 2019

talawahtech commented Jun 28, 2019

cat-turner commented Jun 29, 2019

shandrew commented Jun 29, 2019

cc4i commented Jun 29, 2019 • edited Loading

ACenterA commented Jun 29, 2019

coultn commented Jun 29, 2019 • edited Loading

sandeepboyapati commented Jul 15, 2019

tomaszdudek7 commented Sep 3, 2019

surajrathoresp commented Oct 12, 2019

coultn commented Dec 3, 2019

cc4i commented Jun 29, 2019 •

edited

Loading

coultn commented Jun 29, 2019 •

edited

Loading