-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tags should retry without time bounds on EC2 throttling #3586
Conversation
hi @bflad any chance we can get this in for 1.11 ? timeouts on tags are hitting us pretty hard these days. |
Can you provide debug logs that show that you're hitting EC2 rate limiting and not masking some other error? |
@bflad sure can. I also described in #3128 that we are hitting a 5 minute timeout on a security_group create but its timeout is at 10 minutes:
So when digging deeper we found the hard-coded timeout of 5 minutes on
So then we bumped the hard-coded timeout to 10 minutes - same as the security_group itself - for testing. And it worked just fine repeatedly. This got me to think we could make this a bit smarter than just bumping a hard timeout and instead make it dependent on the resource that wants to be tagged. The main issue I see currently is that people run into timeouts on certain resources, then bump their timeouts to fix it but then wonder why their deployment still fails because there is another "hidden" timeout on the tag of their resource which they cannot change atm. |
It seems to me that this effectively doubles the timeout setting you're actually using for the resource. What do you think of taking the timeout from the schema, minus the time elapsed since initiating the create/update function, and use that for the tag timeout? |
@2rs2ts good points. How would you pass the start time? In the |
@domdom82 I don't know, I'm not really familiar with the code, I just thought of the idea. Sorry I'm not of much help 😅 |
edit: I was probably mistaken to post this debug output or this PR. I have in fact a related problem but that do not appear to be exactly the same. See #3128 (comment) @bflad I cannot share you the full debug logs I have (since it contains credentials) but I have the following terraform error:
The debug logs tell me terraform performs the following request:
And the next response in the logs for ec2/DescribeSecurityGroups I get is:
I would say this confirms the rate limiting is causing the error |
@mildred same here. I think it is not the tag creation itself because tags are very small entities that don't take long to create, however if you are rate throttled while you are creating multiple resources at a time (in my case many security groups along with rules and tags) it can happen that you run into an early timeout (in my case the hard-coded 5 minutes on tags) - even though you might have set a longer timeout on the parent resource (e.g. 10 minutes on security groups). |
bump, what's the status of this PR? |
@2rs2ts I'd love to see it merged. Tag timeouts are one of the most annoying things in our CI pipeline right now. It happens especially often on large sets of security groups getting deployed in one TF file. |
@bflad bump for merge |
Hi @domdom82 👋 Sorry for the delayed response here. In #6409, we introduce a helper function ( What do you think? |
@bflad I think this is a great idea. Ideally, I wouldn't have to configure timeouts on a per-resource basis but only have a provider-level setting. As you said it, it is guesswork by the operator to tweak those timeouts manually and there is never the right setting. |
|
d0066f2
to
f13c950
Compare
7dd771f
to
21a9845
Compare
21a9845
to
1256563
Compare
@bflad LGTM? I also renamed the PR to match the code change more accurately. |
bumped the beast a final time 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @domdom82! 🚀 (We could return early on !isResourceTimeoutError()
to remove the additional nesting but that's more of a nitpick)
(Test failures unrelated)
Tests failed: 2, passed: 245
This has been released in version 1.44.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
this PR addresses the "security_group timeout due to tag timeout" part of issue #3128
since tags are not resources in the sense of terraform, they have no configurable timeouts per se.
in order to avoid hard-coded timeouts on tags, I have provided this PR which tries to use the
Update
timeout of the resource that is being tagged. If no timeout is defined for that resource, the regular default ofResourceData
is used. This should at least provide some means of configuring timeouts on tags via the to-be-tagged resource.