Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider produced inconsistent result after apply ECS Fargate task #20452

Closed
GuilleR-AR opened this issue Aug 5, 2021 · 18 comments
Closed

Provider produced inconsistent result after apply ECS Fargate task #20452

GuilleR-AR opened this issue Aug 5, 2021 · 18 comments
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.

Comments

@GuilleR-AR
Copy link

GuilleR-AR commented Aug 5, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

  • Terraform 0.14.5
  • aws provider 3.38.0

Affected Resource(s)

  • aws_ecs_service

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

Panic Output

Expected Behavior

Updated tags on ECS Fargate tasks images on 13 services

Actual Behavior

After updating 10 services the next one we got the error "Provider produced inconsistent result after apply", weirdly enough this same code was run on different clusters at the same time on diferent pipelines multiple times before without any errors.

Error: Provider produced inconsistent result after apply

When applying changes to module.linux-app-1.aws_ecs_service.app["app11"],
provider "registry.terraform.io/hashicorp/aws" produced an unexpected new
value: Root resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


##[error]PowerShell exited with code '1'.
##[error]PowerShell wrote one or more lines to the standard error stream.
##[error]
Error: Provider produced inconsistent result after apply

When applying changes to module.linux-app-1.aws_ecs_service.app["app11"],
provider "registry.terraform.io/hashicorp/aws" produced an unexpected new
value: Root resource was present, but now absent.

This caused a weirder error when running the pipeline for a second time "creation: InvalidParameterException: Creation of service was not idempotent." So we commented out that service re-ran pipeline it applied, but the service was still in AWS but as INACTIVE. Had to manually destroy it and uncomment it, and the apply an the service went trough.

Error: error creating fargate-service service: error waiting for ECS service (fargate-app11-service) creation: InvalidParameterException: Creation of service was not idempotent.

  on .terraform/modules/linux-app-1/modules/fargate-service/main.tf line 161, in resource "aws_ecs_service" "app":
 161: resource "aws_ecs_service" "app" {



##[error]PowerShell exited with code '1'.
##[error]PowerShell wrote one or more lines to the standard error stream.
##[error]
Error: error creating fargate-app11-service service: error waiting for ECS service (fargate-app11-service) creation: InvalidParameterException: Creation of service was not idempotent.

  on .terraform/modules/linux-app-1/modules/fargate-service/main.tf line 161, in resource "aws_ecs_service" "app":
 161: resource "aws_ecs_service" "app" {

Steps to Reproduce

Don't have a set of steps it was random, hasn't come up since

Important Factoids

References

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service. labels Aug 5, 2021
@ewbankkit ewbankkit removed the needs-triage Waiting for first response or review from a maintainer. label Aug 5, 2021
@rwky
Copy link
Contributor

rwky commented Sep 15, 2021

Experiencing this with the latest AWS module 3.58.0 seems to be happening randomly.

This is the service in question:

resource "aws_ecs_service" "site" {                                                                 
  name                              = "site${var.branch}"                                           
  cluster                           = data.aws_ecs_cluster.main.id                                  
  task_definition                   = "${aws_ecs_task_definition.site.id}:${aws_ecs_task_definition.site.revision}"
  desired_count                     = 1                                                             
  health_check_grace_period_seconds = 60                                                            
  enable_execute_command            = true                                                          
  platform_version                  = "1.4.0"                                                       
  capacity_provider_strategy {                                                                      
    capacity_provider = "FARGATE_SPOT"                                                              
    weight            = 100                                                                         
    base              = 1                                                                           
  }                                                                                                 
                                                                                                    
                                                                                                    
  network_configuration {                                                                           
    security_groups  = [data.aws_security_group.essential.id]                                       
    subnets          = data.aws_subnet.main_subnet.*.id                                             
    assign_public_ip = true                                                                         
  }                                                                                                 
  load_balancer {                                                                                   
    target_group_arn = aws_alb_target_group.site.arn                                                
    container_name   = "site${var.branch}"                                                          
    container_port   = 80                                                                           
  }                                                                                                 
}   

@rwky
Copy link
Contributor

rwky commented Sep 15, 2021

Dug through our CI logs and found this which triggered the resource to be created but not registered in the state, deleting the service manually fixes it.

Error: Provider produced inconsistent result after apply
When applying changes to aws_ecs_service.site, provider
"provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected
new value: Root resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's
 own issue tracker.

@cdegroot
Copy link

Same issue during an apply of a terraform config that only specifies an ECS service and the apply was creating it. Should have been on provider version 3.63.0 but I'm not sure as this was called from inside an ECS container (yeah, don't ask ;-)) which got re-rolled in the meantime.

@n00borama
Copy link

We've just hit this on a non-fargate ECS service.
Provider 3.66.0.

Error: error creating prefix-daemon service: error waiting for ECS service (prefix-daemon) creation: InvalidParameterException: Creation of service was not idempotent.

Removing the service and re-running terraform fixed it. The change that was introduced was propagate_tags = "TASK_DEFINITION".

@rwky
Copy link
Contributor

rwky commented Jan 5, 2022

Some more context on this, I've enabled the TF_LOG and found that the problem service is returning a status of INACTIVE so terraform sees this as an error and taints it which causes an error.

I'm guessing here but either 1. The service is failing to create properly or 2. Terraform isn't waiting for the service to be destroyed before creating it (in our case we taint the service before running apply).

Here's the except of the log:

2022-01-02T22:46:36.429Z [INFO]  provider.terraform-provider-aws_v3.58.0_x5: 2022/01/02 22:46:36 [DEBUG] [aws-sdk-go] {"failures":[],"services":[{"capacityProviderStrategy":[{"base":1,"capacityProvider":"FARGATE_SPOT","weight":1}],"clusterArn":"arn:aws:ecs:eu-west-1:REDACTED:cluster/main","createdAt":1.640877706362E9,"createdBy":"arn:aws:iam::REDACTED:user/test-sites-creator","deploymentConfiguration":{"deploymentCircuitBreaker":{"enable":false,"rollback":false},"maximumPercent":200,"minimumHealthyPercent":100},"deployments":[{"capacityProviderStrategy":[{"base":1,"capacityProvider":"FARGATE_SPOT","weight":1}],"createdAt":1.641163596389E9,"desiredCount":4,"failedLaunchTaskCount":0,"failedTasks":0,"id":"ecs-svc/0233525759737243210","networkConfiguration":{"awsvpcConfiguration":{"assignPublicIp":"ENABLED","securityGroups":["sg-0d3b1b2492ec00529"],"subnets":["subnet-01ca334aeab467abb","subnet-0ff8fe4437ca65fe6","subnet-09be39a5343fa0e17"]}},"pendingCount":0,"platformFamily":"Linux","platformVersion":"1.4.0","replacedTaskCount":0,"rolloutState":"IN_PROGRESS","rolloutStateReason":"ECS deployment ecs-svc/0233525759737243210 in progress.","runningCount":0,"status":"PRIMARY","taskDefinition":"arn:aws:ecs:eu-west-1:REDACTED:task-definition/jobsissue-3340-test2:1","updatedAt":1.641163596389E9}],"desiredCount":0,"enableECSManagedTags":false,"enableExecuteCommand":true,"events":[],"loadBalancers":[],"networkConfiguration":{"awsvpcConfiguration":{"assignPublicIp":"ENABLED","securityGroups":["sg-0d3b1b2492ec00529"],"subnets":["subnet-01ca334aeab467abb","subnet-0ff8fe4437ca65fe6","subnet-09be39a5343fa0e17"]}},"pendingCount":0,"placementConstraints":[],"placementStrategy":[],"platformFamily":"Linux","platformVersion":"1.4.0","propagateTags":"NONE","roleArn":"arn:aws:iam::REDACTED:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS","runStatus":"Inactive","runningCount":0,"schedulingStrategy":"REPLICA","serviceArn":"arn:aws:ecs:eu-west-1:REDACTED:service/main/jobsissue-3340-test2","serviceName":"jobsissue-3340-test2","serviceRegistries":[],"status":"INACTIVE","taskDefinition":"arn:aws:ecs:eu-west-1:REDACTED:task-definition/jobsissue-3340-test2:1","version":0}]}: timestamp=2022-01-02T22:46:36.429Z
2022-01-02T22:46:36.429Z [INFO]  provider.terraform-provider-aws_v3.58.0_x5: 2022/01/02 22:46:36 [WARN] Removing ECS service "arn:aws:ecs:eu-west-1:REDACTED:service/main/jobsissue-3340-test2" because it's INACTIVE: timestamp=2022-01-02T22:46:36.429Z
2022-01-02T22:46:36.429Z [TRACE] maybeTainted: aws_ecs_service.jobs encountered an error during creation, so it is now marked as tainted
2022-01-02T22:46:36.429Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_ecs_service.jobs
2022-01-02T22:46:36.429Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for aws_ecs_service.jobs
2022-01-02T22:46:36.429Z [TRACE] evalApplyProvisioners: aws_ecs_service.jobs is tainted, so skipping provisioning
2022-01-02T22:46:36.429Z [TRACE] maybeTainted: aws_ecs_service.jobs was already tainted, so nothing to do
2022-01-02T22:46:36.429Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_ecs_service.jobs
2022-01-02T22:46:36.429Z [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for aws_ecs_service.jobs
2022-01-02T22:46:36.430Z [TRACE] vertex "aws_ecs_service.jobs": visit complete 

@matt-brewster
Copy link

We are seeing this error on a regular basis. We create approximately ~150 ECS services each morning across a number of test environments and destroy them all in the evening.

Approximately once a week a random ECS service gets stuck in this state where TF has created it but we get the Error: Provider produced inconsistent result after apply error. We then retry the creation but it fails with InvalidParameterException: Creation of service was not idempotent. We have to manually delete the ECS service and then try again, which isn't ideal. It would be great to get a fix here.

@rwky
Copy link
Contributor

rwky commented Mar 18, 2022

I've made a PR #23747 which has a partial workaround/fix for using wait_for_steady_state we've been running this for a few weeks and not had a failure once. Without wait_for_steady_state it still fails. It'll be better if someone more familiar with the module digs deeper into this for a complete fix but for now this is a workaround.

@vat-gatepost-BARQUE
Copy link

We are also seeing this. We just upgraded to AWS provider 4.5 and never had the issue before on 3.6.x. It just has an issue with ECS fargate service that we are trying to create. I have tried to remove it from AWS directly and then run it again and keeps getting the same error.

I am able to remove the new ECS service and the errors go away.

We are also seeing the same issue with this as well:

Implement d.IsNewResource() Checks In Resource Read Functions #16796

anGie44 pushed a commit that referenced this issue Apr 13, 2022
When using `wait_for_stead_state` retry up to 3 times.

This is due to when a service is replaced there is a possibility the
service will return a status of INACTIVE as the AWS API returns the
status of the old service instead of the new one which hasn't fully
registered yet.
@rwky
Copy link
Contributor

rwky commented May 4, 2022

@anGie44 I'm sorry to say that #24223 didn't work, just tried this for the first time and it did this:

Error: error waiting for ECS service (arn:aws:ecs:eu-west-1:122984913885:service/main/admin-issue-3463-devoted-titmouse) to reach steady state after creation: ResourceNotReady: failed waiting for successful resource state
│ 
│   with aws_ecs_service.admin,
│   on admin.tf line 38, in resource "aws_ecs_service" "admin":
│   38: resource "aws_ecs_service" "admin" {

@anGie44
Copy link
Contributor

anGie44 commented May 4, 2022

@rwky 😞 We can move the retry back within the resource logic then since you noted above that fix had proven to work for you for some time.

@rwky
Copy link
Contributor

rwky commented May 4, 2022

If you can make another PR I can pull that branch, build it locally and try it out. Saves us having to wait for an official release.

@anGie44
Copy link
Contributor

anGie44 commented May 4, 2022

thanks @rwky ! PR #24541 ready for testing

@rwky
Copy link
Contributor

rwky commented May 4, 2022

Got it, it's building now I'll let you know how it goes.

@rwky
Copy link
Contributor

rwky commented May 4, 2022

Ok I've done 9 runs and all passed, I'd say that's promising, it's not a guarantee it's fixed since it's possible to run it 9 times and have it not fail but it's a good sign. It's probably safe to merge this.

@anGie44
Copy link
Contributor

anGie44 commented May 4, 2022

Awesome, thanks so much for confirming and testing on your end! I agree, the result sounds more promising than the error behavior you encountered previously. I'l open up the PR for review 👍

@rwky
Copy link
Contributor

rwky commented Jun 13, 2022

@anGie44 we've been running the fixed version for over 2 weeks without issue, just wanted to say thanks and that I owe you a beer/coffee/your beverage of choice!

For anyone else encountering this issue, update to the latest version of this module and enable wait_for_steady_state as a workaround.

@anGie44
Copy link
Contributor

anGie44 commented Jun 17, 2022

Thanks for the feedback @rwky ! You played a big role in that fix so thanks again! I'm going to close this issue as it's been pretty quiet for a while, though if it resurfaces, reach out to the team and we can reopen as needed.

@anGie44 anGie44 closed this as completed Jun 17, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants