-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retry API calls with exponential backoff #287
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience, it took a bit more time than I expected to review the PR.
@tritone FYI |
response.status_code = 200 | ||
data = b"brent-spiner" | ||
response._content = data | ||
http.request.return_value = response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is missing the comment the other tests have. There may be a way to refactor this also, but unsure if it is warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which comment is missing?
@crwilcox @frankyn PTAL. Changes include:
Responded to all other comments as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, clean design with minimal impact on the code base!
Have documentation comment requests
exceptions.BadGateway, # 502 | ||
exceptions.ServiceUnavailable, # 503 | ||
exceptions.GatewayTimeout, # 504 | ||
requests.ConnectionError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tritone this is a change in what we discussed around retrying ConnectionErrors in general. Discussed this with @andrewsg w.r.t retrying unable to connect errors and for now implementation is moving forward with this solution until it's baked into the api_core libraries. Rationale, there's a default timeout enabled that would timeout the retry compared to Golang which does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct, because right now requests.ConnectionError is tied to a specific implementation, e.g. requests and should be wrapped by api_core / cloud_core for it to be used generally outside of just the GCS library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally convinced that retrying for a little while by default if there is an interruption due to network outage is not the right thing to do anyways, but yes, the idea is that instead of tapping into requests here we would have a custom api_core exception like we do with other errors.
@frankyn Added documentation as requested, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small comments, generally looks really good!
exceptions.BadGateway, # 502 | ||
exceptions.ServiceUnavailable, # 503 | ||
exceptions.GatewayTimeout, # 504 | ||
requests.ConnectionError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?
To modify the default retry behavior, call a ``with_XXX`` method | ||
on ``DEFAULT_RETRY``. For example, to change the deadline to 30 seconds, | ||
pass ``retry=DEFAULT_RETRY.with_deadline(30)``. See google-api-core reference | ||
(https://googleapis.dev/python/google-api-core/latest/retry.html) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth having a short sample snippet here rather than explaining and inlining? Might make it easier to understand what's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided against this since a useful snippet here would have to show how to actually inject the modified retry via user code, and that feature is out of scope for this PR (will require changing the signature of every method in the library).
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
Fixes #108
Related: https://github.com/googleapis/google-cloud-python/issues/9298