-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to handle the API rate limit on AWS #1051
Comments
We currently implement parallelism that helps throttle connections with a semaphore. This could potentially be a configurable size. |
@pearkes I think we should introduce a global provider-wise semaphore in the provider as well to artificially rate limit those resources to avoid this. The global parallelism semaphore helps but isn't meant to solve this problem. |
@mitchellh Yea, that's a good point. Provider A may not mind but B wants 30/req minute max. |
This should also be configurable per provider config as e.g. in AWS each account may have different limits which actually aren't public anywhere (not even via API), but you may ask AWS to increase it = we should allow customers having higher limits to bootstrap the infrastructure faster if the API allows it. |
@radeksimko Sounds fair! |
I'm running into these API throttling limits deploying a configuration using master right now on a reasonably sized configuration. Some kind of rate limiting of API calls is definitely required and a rate limit that could be set a-priori as part of the configuration would definitely help. However, Terraform needs to handle the corner case of hitting this limit and automatically retrying and backing off its requests.Terraform cannot assume it is the only consumer of the provider API request budget because for AWS the request rate limit is account-wide and other applications may be depleting the budget independently. I would be happy if it retried failed requests and issued warnings that would prompt me to adjust a rate limit in the config. Extra bonus points if it could automatically modulate the request rate up/down when limits are hit. Here is an example apply that I needed to run twice to get it to complete:
|
True, but AWS is making us blind here... as they don't provide any useful stats like many other APIs from Github, Twitter, Google etc.
If we would have such things, we could make Terraform actually clever. :) |
Right... quite annoying and they really only tell you that you need to implement exponential back-off when you receive the RequestLimitExceeded error as their answer to questions on the matter. I've reached a point now where I cannot deploy my configuration anymore with 0.37 or master due to hitting this API throttling error constantly. Seems to depend on where the apply run errors out with the limit exceeded error and now more often than not it leaves the partially deployed configuration in an unrecoverable state that requires me to manually destroy the VPC from the AWS console and start over. Other times I can run apply 3-4 times and it eventually succeeds after deploying more resources incrementally on each run. This is a show stopper for my automated deployments so my next stop will be with AWS support to see if I can get my API throttling limit bumped up for my account until Terraform can gracefully handle backing off its requests. |
The request limits definitely differ per API call. If you have any way to set some sane attempt limits and retry thresholds on a per API call basis this would go a long way. Might be difficult with the move to aws-sdk-go unless their doing something already. Example - RunInstances tends to baulk around 20-30/sec but it will throttle you quite hard for 5-10 seconds. Whereas DescribeInstances will allow a lot more. AWS does define 3 categories in their documentation, could list the medium/high complexity calls and treat them differently to the default maybe? In my experience AWS don't want to reveal thresholds, to avoid people abusing them. Nathan Sullivan
|
I talked with our AWS support people today and they basically said that the EC2 API rate limits are already the highest of all services. They will not raise them for an account even if you have a very expensive support agreement. An application not implementing back-off was definitely not regarded as a anything even close to sufficient reason for them to even consider raising limits. Because my configuration would basically not ever deploy without hitting the API limits on my account and crapping out... sometimes leaving everything in a bad state... I did some hackery in the ec2 request code in aws-sdk-go to add an exponential back-off for requests that hit the rate limit and now I have it working reliably without ever failing on API rate limit errors. Never written a line of golang in my life until today so it was a gross hack and only deals with the particular case that was blocking me. A proper implementation would put a more general retry mechanism in place around the request code that is aware of the different kinds of request error responses that should be retried in EC2. The AWS docs have a table that lists the different errors that need to be retried (5xx server errors, RateLimitExceeded, ConcurrentTagAccess, some DependencyViolation cases due to eventual consistency, etc.). http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html The retry logic differs across the different services so you need to deal with each one individually. |
@willmcg can you share more details on how you implemented the back-off? |
It was some simple horrible hackery in aws-sdk-go/aws/ec2.go in to EC2Client Do() method that put a retry loop around the request logic to keep retrying the request with brain-dead back-off delay in the case where the EC2 API call returned the "RequestLimitExceeded" error:
Actually it is such a hack that it is really only linear back-off and not even exponential at present :-) I see in the source JSON for the APIs there is a _retry.json that actually details all the retry conditions but it does not look like the golang request code uses this information for retry policy on failed requests. This would be the right way to handle retry rather than my hackery. |
It looks like master of aws-sdk-go implements retries with an exponential back-off on Throttling errors: https://github.com/awslabs/aws-sdk-go/blob/master/aws/service.go#L124-L142. The fork that terraform is using doesn't include it, but perhaps this will help once it catches up with the upstream. If the version of aws-sdk-go used was more up-to-date, would it solve the problem, or would it still make sense to have something in terraform controlling the number of requests being generated? Either way I'm very interested in helping move this forward. I'm new to golang, but very interested in learning. |
Happy to help, how can we move this forward? |
I think most if not all the work has been done to support moving to the official aws sdk go which implements backoff/retry. This should be in the next release. |
+1 as this is really annoying and impactful. |
I'm really hoping https://github.com/awslabs/aws-sdk-go/blob/a79c7d95c012010822e27aaa5551927f5e8a6ab6/aws/service.go#L134 helps, but I'm concerned that the default max retries is too low at 3. In my case it's the AutoScaling API that is throwing rate limit exceeded errors, and I've see the command retry up to 9 times before it succeeds. Granted, I've been using an older version with some custom retry logic added in while I waited for the aws-sdk-go library catch up to upstream, but I copy/pasted the logic from the upstream aws-sdk-go repo, so the behavior should be similar. |
@fromonesrc - When is the next release? |
Looks like it will be in 0.5.0 (https://github.com/hashicorp/terraform/blob/master/CHANGELOG.md#050-unreleased) but I don't know when that will ship. |
Any word on when 0.5.0 will be released? And is it possible to use the rate limiting by building what's currently in the repo or is the rate limiting a feature that hasn't been developed yet but is on the roadmap? |
So I'm running into this issue. Seems like Route53 is VERY aggressive with throttling. I can't get Anybody have a work-around? Otherwise I might have to downgrade temporarily. |
I just started hitting this as well 😒 Much needed 👍 |
We are hitting this as well. |
I was able to hack around it by doing the following: This would basically delay all outbound traffic on my workstation by 1 second. There may be a more elegant solution in using |
When running terraform from master with the retry logic enabled, we were still hit by the API rate limits. After increasing MaxRetries to 11, we were no longer experiencing the issue. It looks like the default of 3 retries is not enough. In #1787, the number of retries is made configurable, with a default of 11 times (ie a delay of 61 seconds for the last retry). |
@koendc, are you running on AWS or some other provider? I can change the number of retries for Openstack, but can't for AWS. |
I should have made myself a bit more clear:
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
If you have a big infrastructure, you will hit the AWS API limit when trying to plan your infrastructure.
Find a way to work around this limitation into the provider itself in some global way
The text was updated successfully, but these errors were encountered: