Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries for Kinesis throttling exceptions #1085

Merged
merged 1 commit into from
Jul 27, 2017

Conversation

cbroglie
Copy link
Contributor

@cbroglie cbroglie commented Jul 7, 2017

Ideally the AWS SDK would automatically retry these transient throttling exceptions, but it is currently unable to b/c the service is incorrectly using LimitExceededException for throttling.

See aws/aws-sdk-go#1376 for additional details.

See hashicorp/terraform#15482 for the original PR.

Ideally the AWS SDK would automatically retry these transient throttling
exceptions, but it is currently unable to b/c the service is incorrectly
using LimitExceededException for throttling.

See aws/aws-sdk-go#1376 for additional
details.
@catsby
Copy link
Contributor

catsby commented Jul 7, 2017

Hey @cbroglie thanks for the patch here. I'm curious if you're open to a different idea here, one that does involve more code, but I think it becomes more obvious to contributors down the road.

I'm curious if instead of doing this in aws/config that we could add this as a separate method to be called in the resource methods themselves, when needed. So, something like this pseudocode:

func resourceAwsKinesisSomethigCreate(d *schema.ResourceData, meta interface{}) error {
  originalConn := meta.(*AWSClient).kinesisconn
  conn := kinesisConnWithLimitRetry(originalConn)

  <the rest of the method>
}

func kinesisConnWithLimitRetry(k *kinesis.Kenesis) (*kinesis.Kensis, error) {
  new_session := session.NewSession(k.Config)
  conn := Kinesis.New(new_session)
  conn.Handlers.Retry.PushBack[...]
  return conn
}

My reasoning is to try and avoid future developers from seeing automatic retries they are not expecting, because they are not explicitly catching and retrying these rate limits here in the traditional waitForState or resource.Retry methods that you see frequently in the rest of the Provider. I imagine that majority of contributors here do not think to check in aws/config for any special retry handlers that have been added to this specific connection object.

I feel like this approach makes it much more explicit that the SDK is being leveraged here. The drawback is that this approach involves more code and each method that uses a Kinesis connection would have to opt-in to it, but I feel the element of least surprise is a winning feature. What do you think?

@catsby catsby added the waiting-response Maintainers are waiting on response from community or contributor. label Jul 7, 2017
@cbroglie
Copy link
Contributor Author

@catsby I agree with the sentiment that it's best to do the least surprising thing, even if it involves more code.

The reason I think I still prefer my original approach is that I feel these are retries that the SDK should be performing automatically, and based on the discussion in aws/aws-sdk-go#1376, I'm hopeful future versions of it will. Therefore I believe it's best to implement the retries now using the same exact mechanism that the SDK will later utilize, so there is no behavior change when it happens.

@cbroglie
Copy link
Contributor Author

@catsby @apparentlymart Any updates on this? Our environment uses 30+ Kinesis streams and we cannot successfully run terraform without this patch.

@catsby
Copy link
Contributor

catsby commented Jul 27, 2017

I'll go ahead and merge this change, thank you!

Test:

TF_ACC=1 go test ./aws -v -run=TestAccAWSKinesisStream_ -timeout 120m
=== RUN   TestAccAWSKinesisStream_basic
--- PASS: TestAccAWSKinesisStream_basic (84.69s)
=== RUN   TestAccAWSKinesisStream_importBasic
--- PASS: TestAccAWSKinesisStream_importBasic (85.84s)
=== RUN   TestAccAWSKinesisStream_shardCount
--- PASS: TestAccAWSKinesisStream_shardCount (186.23s)
=== RUN   TestAccAWSKinesisStream_retentionPeriod
--- PASS: TestAccAWSKinesisStream_retentionPeriod (152.90s)
=== RUN   TestAccAWSKinesisStream_shardLevelMetrics
--- PASS: TestAccAWSKinesisStream_shardLevelMetrics (215.47s)
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws       725.152s



TF_ACC=1 go test ./aws -v -run=TestAccAWSKinesisFirehoseDeliveryStream_ -timeout 120m
=== RUN   TestAccAWSKinesisFirehoseDeliveryStream_s3basic
--- PASS: TestAccAWSKinesisFirehoseDeliveryStream_s3basic (262.07s)
=== RUN   TestAccAWSKinesisFirehoseDeliveryStream_s3WithCloudwatchLogging
--- PASS: TestAccAWSKinesisFirehoseDeliveryStream_s3WithCloudwatchLogging (235.12s)
=== RUN   TestAccAWSKinesisFirehoseDeliveryStream_s3ConfigUpdates
--- PASS: TestAccAWSKinesisFirehoseDeliveryStream_s3ConfigUpdates (338.19s)
=== RUN   TestAccAWSKinesisFirehoseDeliveryStream_RedshiftConfigUpdates
--- PASS: TestAccAWSKinesisFirehoseDeliveryStream_RedshiftConfigUpdates (1780.38s)
=== RUN   TestAccAWSKinesisFirehoseDeliveryStream_ElasticsearchConfigUpdates
--- PASS: TestAccAWSKinesisFirehoseDeliveryStream_ElasticsearchConfigUpdates (1231.84s)
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws       3847.624s

@catsby catsby merged commit 5544ff1 into hashicorp:master Jul 27, 2017
@cbroglie
Copy link
Contributor Author

Thanks!

bflad added a commit that referenced this pull request Aug 30, 2019
The AWS Go SDK now automatically retries on this error.

References:

- aws/aws-sdk-go#1376
- #1085
- aws/aws-sdk-go#2751
- #9770

Output from acceptance testing:

```
--- PASS: TestAccAWSKinesisStream_createMultipleConcurrentStreams (205.83s)
```
@ghost
Copy link

ghost commented Apr 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 11, 2020
@breathingdust breathingdust removed the waiting-response Maintainers are waiting on response from community or contributor. label Sep 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants