Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS Service always wants to be recreated due to capacity provider. #22823

Open
spatel96 opened this issue Jan 28, 2022 · 17 comments
Open

ECS Service always wants to be recreated due to capacity provider. #22823

spatel96 opened this issue Jan 28, 2022 · 17 comments
Labels
regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. service/ecs Issues and PRs that pertain to the ecs service.

Comments

@spatel96
Copy link

spatel96 commented Jan 28, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v0.13.6
+ provider.aws v3.73.0

Affected Resource(s)

  • aws_ecs_service

Terraform Configuration Files

Terraform Plan:

  # module.my_service.aws_ecs_service.ecs_service must be replaced
+/- resource "aws_ecs_service" "ecs_service" {
        cluster                            = "arn:aws:ecs:us-west-1:***:cluster/ecs-related-tapir"
        deployment_maximum_percent         = 200
        deployment_minimum_healthy_percent = 100
        desired_count                      = 2
        enable_ecs_managed_tags            = false
        enable_execute_command             = false
        health_check_grace_period_seconds  = 120
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = "arn:aws:ecs:us-west-1:***:service/my-cluster/my-service-5e" -> (known after apply)
      ~ launch_type                        = "EC2" -> (known after apply)
        name                               = "my-service-service-5e"
      + platform_version                   = (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ tags_all                           = {} -> (known after apply)
      ~ task_definition                    = "arn:aws:ecs:us-west-1:***:task-definition/my-service-:23" -> "arn:aws:ecs:us-west-1:***:task-definition/my-service:1"
        wait_for_steady_state              = false

      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "ecs-capacity-provider-related-tapir"
          + weight            = 100
        }

        deployment_controller {
            type = "CODE_DEPLOY"
        }

        load_balancer {
            container_name   = "my-service"
            container_port   = 7171
            target_group_arn = "arn:aws:elasticloadbalancing:us-west-1:***:targetgroup/abcdef/abcdef"
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Terraform Apply error:

Error: error creating ECS service (my-service): InvalidParameterException: Creation of service was not idempotent.

Expected Behavior

No infrastructure changes should be made

Actual Behavior

The ECS Service resource will be recreated, but the apply with fail with the error logs specified above.

Steps to Reproduce

  1. Provision an ECS service with a capacity provider
  2. terraform apply
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ecs Issues and PRs that pertain to the ecs service. labels Jan 28, 2022
@breathingdust breathingdust added the regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. label Feb 3, 2022
@justinretzolk justinretzolk removed the needs-triage Waiting for first response or review from a maintainer. label Mar 15, 2022
@gvwirth
Copy link

gvwirth commented Apr 14, 2022

FYI we are still seeing this bug in the provider version 4.9.

@anGie44 anGie44 self-assigned this Apr 25, 2022
@anGie44
Copy link
Contributor

anGie44 commented Apr 25, 2022

Possibly related to existing issue: #2283 (destroy/create behavior)

*Correction -- as the update was not expected behavior, i'm guessing the capacity_provider_strategy is inherited from the aws_ecs_cluster where it is defined. Do you mind confirming @spatel96 ?

@a-nych
Copy link

a-nych commented May 5, 2022

This issue is very destructive.

When an ECS cluster has a default_capacity_provider_strategy setting defined, Terraform will mark all services that don't have

  lifecycle {
    ignore_changes = [
      capacity_provider_strategy
    ]
  }

to be recreated.

@nitrocode
Copy link
Contributor

It's the only differences I can see when comparing capacity_provider_strategy and deployment_controller are MaxItems and DiffSuppressFunc. I wonder if that is what's causing this recreation... I would have thought that the removing the ForceNew would have also removed recreating capacity_provider_strategy...

"deployment_controller": {
Type: schema.TypeList,
Optional: true,
MaxItems: 1,
// Ignore missing configuration block
DiffSuppressFunc: func(k, old, new string, d *schema.ResourceData) bool {
if old == "1" && new == "0" {
return true
}
return false
},
Elem: &schema.Resource{

"capacity_provider_strategy": {
Type: schema.TypeSet,
Optional: true,
Elem: &schema.Resource{

@anGie44
Copy link
Contributor

anGie44 commented May 26, 2022

Hi @nitrocode thanks for looking through the code! My initial thinking was that @spatel96 is using both the aws_ecs_capacity_provider and aws_ecs_service resources so while capacity_provider_strategy is not explicitly configured in the aws_ecs_service terraform configuration, the value is inherited from the separate aws_ecs_capacity_provider resource after an initial terraform apply, so the next apply or plan will show that diff (though this still just my conjecture as the original configuration is not yet known). And then that diff is handled with this portion of the code

func capacityProviderStrategyCustomizeDiff(_ context.Context, d *schema.ResourceDiff, meta interface{}) error {
// to be backward compatible, should ForceNew almost always (previous behavior), unless:
// force_new_deployment is true and
// neither the old set nor new set is 0 length
if v := d.Get("force_new_deployment").(bool); !v {
return capacityProviderStrategyForceNew(d)
}
old, new := d.GetChange("capacity_provider_strategy")
ol := old.(*schema.Set).Len()
nl := new.(*schema.Set).Len()
if (ol == 0 && nl > 0) || (ol > 0 && nl == 0) {
return capacityProviderStrategyForceNew(d)
}
return nil
}
which is forcing the new resource. The logic needs to account for cases where the provider strategy is inherited from an outside configuration or simply mark the capacity_provider_strategy as Computed so that the diff is ignored.

@relsqui
Copy link

relsqui commented Aug 16, 2022

I was seeing this same issue and can confirm that adding a capacity_provider_strategy block in my aws_ecs_service, duplicating my default_capacity_provider_strategy, resolved it.

@ericdahl
Copy link

This has been a big annoyance for us. We have many production ECS Services that are using LaunchType: EC2 and we'd like to convert them to using a newly defined default Capacity Provider strategy on the cluster.

If we simply set the capacity provider, it will force the re-create of the ECS Service leading to temporary disruption/downtime. This isn't necessary as AWS supports the graceful transition of LaunchType: EC2 to Capacity Provider (but not the other way around). It does a "force new deployment" of the ECS Tasks, but it uses the standard ECS rollout mechanism (e.g., minHealthy) so there's no disruption.

Our current workaround is to use the ignore_changes as above, plus converting ECS Services to Capacity Provider via separate CLI type automation.

(Also, tangentially related is #26533 - for transitioning existing ECS Services to use the Cluster's default capacity provider strategy)

@remil1000
Copy link

if I may add, empty capacity_provider_strategy list could be useful also
it seems this support was added to the AWS cli and API - aws/containers-roadmap#838 (comment) so that

$ aws ecs update-service --cluster cluster-name --service service-name --capacity-provider-strategy '[]' --force-new-deployment

removes strategy from a ECS service (when inherited from default defined at the ECS cluster level) which is useful if you're planning to remove the default capacity provider strategy from the ECS cluster

It seems that currently if no capacity_provider_strategy is defined in the aws_ecs_service resource the AWS API call will not have any value set and the default strategy will be used

@vishwa-trulioo
Copy link

vishwa-trulioo commented Feb 18, 2023

It's sad to see that It's been over 1 year and still not fixed. :-( AWS has to do a better job than this if they want people to keep using ECS and keep it stay alive.

@bbratchiv
Copy link

any updates on this? I see the PR is pending

@rmccarthy-ellevation
Copy link

Any update on this?

@1oglop1
Copy link

1oglop1 commented Oct 15, 2023

@breathingdust Hi, is this something you can look into? The AWS side has been fixed, and now Terraform incorrectly causes replacement.

@claudiosf
Copy link

Issue still exists.

@Luis-3M
Copy link

Luis-3M commented Nov 23, 2023

Issue still exists.

Yep we're facing the same problem too

akuzminsky added a commit to infrahouse/terraform-aws-ecs that referenced this issue Dec 5, 2023
It's a workaround of a bug
hashicorp/terraform-provider-aws#22823

Terraform never converged and wanted to re-create the service.
```
Terraform used the selected providers to generate the following
execution plan. Resource actions are indicated with the following
symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.test.aws_ecs_service.ecs must be replaced
-/+ resource "aws_ecs_service" "ecs" {
      - health_check_grace_period_seconds  = 0 -> null
      ~ iam_role                           =
"/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS" -> (known
after apply)
      ~ id                                 =
"arn:aws:ecs:us-east-2:303467602807:service/test-terraform-aws-ecs/test-terraform-aws-ecs"
-> (known after apply)
      + launch_type                        = (known after apply)
        name                               = "test-terraform-aws-ecs"
      + platform_version                   = (known after apply)
      - propagate_tags                     = "NONE" -> null
      - tags                               = {} -> null
      ~ triggers                           = {} -> (known after apply)
        # (10 unchanged attributes hidden)

      - capacity_provider_strategy { # forces replacement
          - base              = 1 -> null
          - capacity_provider = "test-terraform-aws-ecs" -> null
          - weight            = 100 -> null
        }

      - deployment_circuit_breaker {
          - enable   = false -> null
          - rollback = false -> null
        }

      - deployment_controller {
          - type = "ECS" -> null
        }

        # (1 unchanged block hidden)
    }
```
@harbinder-kleene
Copy link

When the fix would be released? It is affecting my team too.

@ZilvinasKucinskas
Copy link

+1

This is a major issue. We are running many FARGATE instances and would like to increase the capacity further by adding FARGATE SPOT instances. However, it is not possible to do without downtime (it destroys the whole ECS service and recreates it).

@dejanzele
Copy link

dejanzele commented Sep 24, 2024

Hi all,

I am interested in submiting a fix for this issue as it is impacting our internal usage also.

Is the community in agreement what are the latest requirements on how the update should work, as in the comments a couple of ideas are mentioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
Development

No branches or pull requests