Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retry API calls with exponential backoff #287

Merged
merged 15 commits into from
Oct 16, 2020
Merged

feat: retry API calls with exponential backoff #287

merged 15 commits into from
Oct 16, 2020

Conversation

andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Oct 2, 2020

@andrewsg andrewsg requested a review from frankyn October 2, 2020 17:35
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 2, 2020
@frankyn frankyn added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 2, 2020
@frankyn frankyn requested a review from crwilcox October 2, 2020 18:33
Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience, it took a bit more time than I expected to review the PR.

google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Outdated Show resolved Hide resolved
google/cloud/storage/hmac_key.py Outdated Show resolved Hide resolved
google/cloud/storage/bucket.py Outdated Show resolved Hide resolved
google/cloud/storage/retry.py Outdated Show resolved Hide resolved
tests/unit/test_hmac_key.py Outdated Show resolved Hide resolved
google/cloud/storage/_http.py Outdated Show resolved Hide resolved
tests/unit/test__http.py Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

andrewsg commented Oct 8, 2020

@tritone FYI

google/cloud/storage/retry.py Outdated Show resolved Hide resolved
response.status_code = 200
data = b"brent-spiner"
response._content = data
http.request.return_value = response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is missing the comment the other tests have. There may be a way to refactor this also, but unsure if it is warranted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which comment is missing?

@andrewsg
Copy link
Contributor Author

@crwilcox @frankyn PTAL. Changes include:

  • Retry predicate simplified; my testing suggests the google-api-core nuanced inspection of error messages is not needed for GCS.
  • Retry predicate now includes all HTTP error codes listed in retry design, except 408
    -- 408 is mysteriously missing from google-api-core exceptions and I have no way to test what happens if we actually get one (testbench was no help here)
  • Retry predicate now includes ConnectionError which encompasses failure to find host and broken connections of all types. LMK if we need to narrow it down; we have fairly rich and detailed info on the specific ConnectionError subtype which we can use here.

Responded to all other comments as well.

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, clean design with minimal impact on the code base!

Have documentation comment requests

exceptions.BadGateway, # 502
exceptions.ServiceUnavailable, # 503
exceptions.GatewayTimeout, # 504
requests.ConnectionError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tritone this is a change in what we discussed around retrying ConnectionErrors in general. Discussed this with @andrewsg w.r.t retrying unable to connect errors and for now implementation is moving forward with this solution until it's baked into the api_core libraries. Rationale, there's a default timeout enabled that would timeout the retry compared to Golang which does not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, because right now requests.ConnectionError is tied to a specific implementation, e.g. requests and should be wrapped by api_core / cloud_core for it to be used generally outside of just the GCS library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally convinced that retrying for a little while by default if there is an interruption due to network outage is not the right thing to do anyways, but yes, the idea is that instead of tapping into requests here we would have a custom api_core exception like we do with other errors.

google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Show resolved Hide resolved
google/cloud/storage/retry.py Show resolved Hide resolved
@andrewsg andrewsg removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 15, 2020
@andrewsg
Copy link
Contributor Author

@frankyn Added documentation as requested, PTAL

Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small comments, generally looks really good!

exceptions.BadGateway, # 502
exceptions.ServiceUnavailable, # 503
exceptions.GatewayTimeout, # 504
requests.ConnectionError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this seems fine to me re: the fact that there is a default timeout in python. So the idea is that down the road, we can add an error in api_core which catches "connection reset by peer" more specifically?

To modify the default retry behavior, call a ``with_XXX`` method
on ``DEFAULT_RETRY``. For example, to change the deadline to 30 seconds,
pass ``retry=DEFAULT_RETRY.with_deadline(30)``. See google-api-core reference
(https://googleapis.dev/python/google-api-core/latest/retry.html) for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth having a short sample snippet here rather than explaining and inlining? Might make it easier to understand what's necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against this since a useful snippet here would have to show how to actually inject the modified retry via user code, and that feature is out of scope for this PR (will require changing the signature of every method in the library).

@andrewsg andrewsg added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 15, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 15, 2020
@andrewsg andrewsg merged commit fbe5d9c into master Oct 16, 2020
@andrewsg andrewsg deleted the retry branch October 16, 2020 19:21
@NoahDragon
Copy link

NoahDragon commented Nov 9, 2020

Just wonder if the blob.update() also has the retry logic with this feature. Is there any way to specify the retry number?

Update: followed the API core doc, using DEFAULT_RETRY as decorator seems works. But how can I confirm if the retry happened?

Update: there is a fix on this PR #340

cojenco pushed a commit to cojenco/python-storage that referenced this pull request Oct 13, 2021
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
cojenco pushed a commit to cojenco/python-storage that referenced this pull request Oct 13, 2021
Retries errors for idempotent API calls by default. Some API calls are conditionally idempotent (only idempotent if etag, generation, if_generation_match, if_metageneration_match are specified); in those cases, retries are also conditional on the inclusion of that data in the call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add retry (exponential backoff) for storage api functions
6 participants