-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrap s3 calls with retry to avoid race in s3 bucket creation calls #891
Conversation
tested locally with: TF_ACC=1 go test ./aws -run TestAccAWSS3MultiBucket_withTags -v |
hi @radeksimko, could you please give us an indication of when this pr will be reviewed? thanks! |
@xchapter7x thank you for raising the PR & issue. While I believe there are some eventual consistency issues that need addressing in S3 I'm in doubts whether this is the right place as I'm not sure the error in #877 is coming from that specific place. I did notice we get a bunch of test failures once in a while in our nightly acceptance tests, but the error usually came from a followup API call, e.g. setting policy, CORS, website, versioning, ... Would you mind providing debug logs we can go through to understand the nature of the problem better (i.e. find out which API request is actually returning the error)? TL;DR I believe that this is a real problem which needs fixing, but I'm not convinced this PR is actually fixing it. Maybe it is, but we need more data before moving forward. Thanks. |
@radeksimko Let me spin up an environment to replicate the issue and get you the requested logs. |
I'm afraid we'll have to add retries to each followup call as S3 is eventually consistent and the API doesn't do client pinning, AFAIK - which means that 1st success (200 OK) does not mean the change (new bucket) was propagated fully. I think we might still get In fact KMS was so badly affected by this that I had to implement "guessing" into the retry logic: btw. all this behaviour is pretty much undocumented, it's just what I remember from my past experience with the AWS APIs and it's hard to test. 😄 I hope that's not too discouraging - I'd love to get such intermittent issues fixed and I understand it's tricky to do so. Regarding the code:
|
i have a large debug output log, tfplan & tfstate all reproducing the failure. In regards to the feedback I'll take a look at implementing resource.Retry and others suggested instead of direct sdk calls. I decided to use the template instead of Sprintf, b/c it was a lot larger than the other use cases in the tests that i saw, and made the implementation a bit more readable/manageable. Happy to switch it to use sprintf, if that is a more consistent implementation. |
@xchapter7x feel free to send me the debug log to radek@hashicorp.com You can also GPG encrypt it using my pubkey published at https://keybase.io/radeksimko Re templating - Admittedly the nice thing about templates is that we get a crash straight away if we can't compile the template (e.g. forgotten placeholder/variable), whereas we get invalid tf config with |
I sent everything over in keybase. Let me know if you're able to find anything. Im going to switch this PR back to Thanks for your continued support in working through this. |
Thanks for providing the debug log, that's very useful. 👌 On first sight there seems to be quite a lot of 404s there, even Based on the log it seems worse (404 appearing more often) than anything I've seen so far. On the positive side the worse the situation is the easier it should be to fix it. 😄 |
hi @radeksimko sorry for the long silence on this. I got pulled away. I found some time to revisit this today and I have a question around the If I'm understanding the use case properly, this However, the client calls currently being wrapped provide a Im curious why one would choose to use Specific to this PR, would it be an acceptable implementation to leverage Your thoughts and feedback are greatly appreciated. Thanks. |
hi @radeksimko , sorry for the long delay. This should be ready to be reviewed. Let me know your thoughts and feedback. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the patience.
Thanks @radeksimko very much. Do you know which release this fix will be included on please? |
wrap s3 calls with retry to avoid race in s3 bucket creation calls
Have these changes been released? I'm seeing this issue with 0.11.1 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
issue: [https://github.com//issues/877]