(aws-ecs): Can't delete a stack with ASG Capacity providers #14732

hnrc · 2021-05-17T14:05:20Z

It seem to not be possible to gracefully uninstall an ECS cluster that is associated with an ASG Capacity Provider.
CF hangs and never really finishes, unless one manually deletes the ASG.

Reproduction Steps

Create an ECS cluster with:

const cluster = new Cluster(this, 'EcsCluster', {
  vpc,
  clusterName: props.clusterName,
});

const autoScalingGroup = new AutoScalingGroup(this, 'Asg', {
  vpc,
  machineImage: EcsOptimizedImage.amazonLinux2(),
  instanceType: new InstanceType('t3.micro'),
  minCapacity: 1,
  maxCapacity: 100,
});

const capacityProvider = new AsgCapacityProvider(
  this,
  'AsgCapacityProvider',
  {
    autoScalingGroup,
    capacityProviderName: props.clusterName,
  },
);
cluster.addAsgCapacityProvider(capacityProvider);

Uninstall the stack (I did it through the AWS Console)
Wait for it...
Go grab a cup of ☕
Realize that the stack deletion will never finish

What did you expect to happen?

The CF stack should be properly and gracefully removed.

What actually happened?

The CF stack got stuck in DELETE_IN_PROGRESS

AWS::EC2::InternetGateway

The internetGateway 'igw-03ec296b77d21956f' has dependencies and cannot be deleted. (Service: AmazonEC2; Status Code: 400; Error Code: DependencyViolation; Request ID: f50851df-172c-4365-b761-e6b710f5b30b; Proxy: null)

AWS::ECS::Cluster

Resource handler returned message: "Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: AmazonECS; Status Code: 400; Error Code: ClusterContainsContainerInstancesException; Request ID: 005e0a22-5547-44da-a51e-5e6b45b39b84; Proxy: null)'." (RequestToken: 50d43055-7cc8-6306-0de1-48c93e63cf96, HandlerErrorCode: GeneralServiceException)

AWS::AutoScaling::LaunchConfiguration

Cannot delete launch configuration lulz-cluster-AsgLaunchConfig6D4F96BB-15LZGM814H5M4 because it is attached to AutoScalingGroup lulz-cluster-AsgASGD1D7B4E2-R02BX4676AJJ (Service: AmazonAutoScaling; Status Code: 400; Error Code: ResourceInUse; Request ID: 3583ab1b-7c1a-47de-929a-67cb705f684f; Proxy: null)

AWS::AutoScaling::AutoScalingGroup:

Group did not stabilize. {current/minSize/maxSize} group size = {1/0/0}.

The stack finished deleting after I manually removed the ASG.

Environment

CDK CLI Version: 1.104.0
Framework Version: 1.104.0
Node.js Version: 14.16.1
OS: MacOS 10.15.7
Language (Version): Typescript 4.2.4

Other

This seems like a related discussion: aws/containers-roadmap#631 (comment)

This is 🐛 Bug Report

The text was updated successfully, but these errors were encountered:

SoManyHs · 2021-06-04T22:31:44Z

I believe the reason for this is because the default for managed termination protection is set to true. This means that Cloudformation cannot delete the ASG associated with the ASG capacity provider because the instances are protected from scale-in.

Two ways you can get around this:

Manually delete the ASG using the AWS EC2 console or AWS CLI, then delete the CFN stack.
Set enableManagedTerminationProtection on the ecs.AsgCapacityProvider to false. This will allow you to run cdk destroy as usual.

See: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-capacityprovider-autoscalinggroupprovider.html#cfn-ecs-capacityprovider-autoscalinggroupprovider-managedterminationprotection

I am still investigating if there are other ways around this while still creating an ASG with managed termination protection, but see if either of the above works for you @hnrc !

hnrc · 2021-06-05T20:08:15Z

I am still investigating if there are other ways around this while still creating an ASG with managed termination protection, but see if either of the above works for you @hnrc !

Thanks for looking into this.

Both workarounds work me.
I think we can live with having managed termination protection disabled so this is no longer very critical (for me personally).

SoManyHs · 2021-06-07T17:49:06Z

Great, glad the workarounds helped! Closing this issue.

github-actions · 2021-06-07T17:49:29Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Deleting stacks using ECS clusters having capacityProviders (i.e. dual-primary and primary-replica recipes), fails with: ``` The Cluster cannot be deleted while Container Instances are active or draining. ``` This is an issue that manifests itself as well via terraform [1] or CDK [2]. Explicitly deleting the Autoscaling Groups _before_ the ECS cluster deletion fixes the problem, since it ensures that no instances are active or draining, as the error suggests. This is safe to do, because prior to deleting the Autoscaling Groups, every ECS service has already been destroyed, thus no instance is actually running. [1] hashicorp/terraform-provider-aws#4852 [2] aws/aws-cdk#14732 Bug: Issue 14698 Change-Id: I216307ef88bd7b7317706d2dc0a6a6e6fb367bd4 Change-Id: I27ece0f6971b157a474d91d7f3d9243dcff596e6

metametadata · 2021-12-11T01:24:58Z

@SoManyHs

Experiencing the same issue with the defaults in addAsgCapacityProvider. It was surprising as we didn't have such issue with now deprecated addCapacity and we have no ECS tasks in ASG when we delete the stack.

Feature request. Ideally, CloudFormation must not hang but fail as fast as possible with an error message about the termination protection.
Documentation enhancement request. From https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html:

By default, an Auto Scaling Group Capacity Provider will manage the Auto Scaling Group's size for you. It will also enable managed termination protection, in order to prevent EC2 Auto Scaling from terminating EC2 instances that have tasks running on them. If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.

It's not fully clear from the description that the flag simply disables deletion of ASG. I got an incorrect impression that it somehow cleverly understands that there are no ECS tasks running and allows deletion in such case.
Question/documentation enhancement request. We'll likely have to set enableManagedTerminationProtection to false in our automated undeploy code. But what are the risks of turning this protection off? E.g. we don't want ECS tasks to shut down at random times.
Question/documentation enhancement request. Is it OK to set enableManagedTerminationProtection=false + enableManagedScaling=true? It seems to work but is against the documentation ("If you want to disable this behavior, set both enableManagedScaling to and enableManagedTerminationProtection to false.").

edit: I later created a new issue for this: #18179.

gshpychka · 2022-03-16T13:09:58Z

@SoManyHs this is still an issue, since we have to use manual hacks to destroy the stack.

nathanpeck · 2024-01-02T16:45:48Z

Hey all, I've created a reference CloudFormation template that demonstrates how to avoid this issue. The end to end solution for the capacity provider with working teardown can be found here: https://containersonaws.com/pattern/ecs-ec2-capacity-provider-scaling

You can also refer directly to the sample code for the Lambda function here: https://github.com/aws-samples/container-patterns/blob/main/pattern/ecs-ec2-capacity-provider-scaling/files/cluster-capacity-provider.yml#L48-L123

In short, this solution implements a custom ASG destroyer resource, which is used to force kill the ASG so that it does not block the CloudFormation stack teardown.

A similar approach could be implemented in CDK. I've added a todo item for me to make a CDK specific example as well.

hnrc added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 17, 2021

hnrc changed the title ~~(aws-ecs): Can't delete a stack with ASG Capacity Provider~~ (aws-ecs): Can't delete a stack with ASG Capacity providers May 17, 2021

peterwoodworth added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label May 17, 2021

peterwoodworth assigned SoManyHs, uttarasridhar and MrArnoldPalmer May 17, 2021

SoManyHs added p2 and removed needs-triage This issue or PR still needs to be triaged. labels Jun 1, 2021

SoManyHs assigned SoManyHs and unassigned uttarasridhar, MrArnoldPalmer and SoManyHs Jun 1, 2021

SoManyHs added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jun 4, 2021

SoManyHs closed this as completed Jun 7, 2021

SoManyHs removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jun 7, 2021

metametadata mentioned this issue Dec 25, 2021

(aws-ecs): hanging on deleting a stack with ASG capacity provider #18179

Open

chelma mentioned this issue Aug 30, 2023

destroy-cluster hangs on Capture ASG deletion arkime/aws-aio#92

Closed

jj22ee mentioned this issue Oct 15, 2024

Update Cloudformation Template, replace EC2 Launch Config with Launch Template aws-samples/eb-java-scorekeep#21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(aws-ecs): Can't delete a stack with ASG Capacity providers #14732

(aws-ecs): Can't delete a stack with ASG Capacity providers #14732

hnrc commented May 17, 2021

SoManyHs commented Jun 4, 2021

hnrc commented Jun 5, 2021

SoManyHs commented Jun 7, 2021

github-actions bot commented Jun 7, 2021

metametadata commented Dec 11, 2021 •

edited

Loading

gshpychka commented Mar 16, 2022

nathanpeck commented Jan 2, 2024

(aws-ecs): Can't delete a stack with ASG Capacity providers #14732

(aws-ecs): Can't delete a stack with ASG Capacity providers #14732

Comments

hnrc commented May 17, 2021

Reproduction Steps

What did you expect to happen?

What actually happened?

Environment

Other

SoManyHs commented Jun 4, 2021

hnrc commented Jun 5, 2021

SoManyHs commented Jun 7, 2021

github-actions bot commented Jun 7, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

metametadata commented Dec 11, 2021 • edited Loading

gshpychka commented Mar 16, 2022

nathanpeck commented Jan 2, 2024

metametadata commented Dec 11, 2021 •

edited

Loading