-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_security_group: timeout while waiting for state to become 'success'. Subsequent terraform runs fails on that resource #3128
Comments
Perhaps an easy way to mitigate this problem could be to allow configurable timeouts on aws_security_group resources. As it is now, it is not configurable:
|
I am having a similar issue. Our TF deployments fail intermittently on |
Just found there is already a However, it still keeps running into the 5min timeout even-though I bumped |
Did some more investigation: I am getting these error messages from time to time:
However, when I look at the code of
So I am wondering: Could this longer timeout be overshadowed by a generic 5 minute timeout so that it does not even apply? |
I dug even deeper and found out that the real timeout occurs not in the security_group but in the tag on the security group:
which has a hard-coded timeout of 5 minutes. Since tags can be on any resource, this timeout acts in fact as a hidden timeout on all AWS resources that have tags. Not cool. |
@mildred this becomes even worse as timeouts can currently only be defined on resources but tags are not modelled as resources. So a potential fix would have to "inherit" the timeout from the resource that wants to be tagged or otherwise use a default value. hmm. |
Opened two PRs that should fix both problems mentioned in this issue. |
Does anyone have a hack/workaround to handle this problem (and destroy the resource somehow) while a fix is being developed? |
I didn't follow up on this issue, sorry. Our solution was to automatically retry failed terraform execution once. It worked for most of the occurrences but we still have a small number of cases where terraform is failing and the second run it continues to fail because the first failure made a half-resource. This specific case can be identified by the first terraform rule showing an error like:
And the next terraform is failing with:
We could dig this up to a hardcoded 5m timeout too on the DescribeSecurityGroups API call. The code here is obvious: The solution is probably not to have configurable timeouts, but for terraform to honor throttling algorithms for AWS requests. In probably multiple places too... We contacted AWS support and they told us:
|
We now have implemented saving debug output on our failed terraform calls. I'll soon attach full debug output here. |
Did not attach full debug output because it contained credentials. However, there is some debug output I posted in #3586 (comment) that supports the need of the pull request #3911:
|
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
@mildred terraform does implement an exponential backoff algorithm, for example in However, I think the algorithm is not used in every API call throughout. |
@terraform-providers/terraform-provider-aws |
@domdom82 Here the problem is not that the exponential backoff algorithm is not in use, it's that the terraform resource doesn't have a configurable timeout, and that in my case the default timeout is too low. |
@mildred in your case have you seen |
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
No, I'm getting those AWS errors (in the debug output):
|
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you! |
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer than 5m to complete. Transform the hard timeout of 5m with a configurable timeout to avoid this problem. Fixes part of hashicorp#3128
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Short story: we know that AWS is throttling our API requests. Sometimes we timeout on creating a security group. The problem is however that subsequent terraform runs are failing because the security group was created but is not completely present in tfstate. Security group rules are not recorded in tfstate.
Terraform Version
terraform 0.11.2
aws provider ersion 1.7.1
Affected Resource(s)
aws_security_group
There might be a problem on how terraform handles resources that fails. perhaps on failure this resource should be tainted so subsequent runs succeeds.
Terraform Configuration Files
Debug Output
This is a transident error with terraform running in an automated environment. We do not have debug output for this run at the moment.
However, we run terraform multiples times, and the first time we run it, we have the following error
Then all subsequent
terraform apply
executions fails with:Full logs here: https://gist.github.com/mildred/9245356ec1ef599f91eb15f2bd9a6666
Expected Behavior
Terraform should taint the security group if it fails on it due to a timeout so next run will create it anew. Or perhaps just taint the security_group_rules within it. Or it should register the security group rules properly in the tfstate.
Actual Behavior
Terraform timeouts then fails to create the resource because a rule it thought was not present is created.
Steps to Reproduce
Run terraform enough to be throttled by AWS
Important Factoids
max_retried
setting for the aws provider to 40The text was updated successfully, but these errors were encountered: