Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prototype] Retries for Storage #197

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 20 additions & 8 deletions google/cloud/storage/_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@
from hashlib import md5
from datetime import datetime
import os
import functools

from six.moves.urllib.parse import urlsplit
from google.cloud.storage.constants import _DEFAULT_TIMEOUT

# This needs to be updated when retry is reviewed more.
from google.cloud.storage.retry import _DEFAULT_RETRY

STORAGE_EMULATOR_ENV_VAR = "STORAGE_EMULATOR_HOST"
"""Environment variable defining host for Storage emulator."""
Expand Down Expand Up @@ -134,6 +136,14 @@ def _query_params(self):
params["userProject"] = self.user_project
return params


def _call_api(self, client, retry, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should retry be a kwarg? Do callers explicitly have to pass in None if they don't want a retry? Would be good to clarify here.

call = functools.partial(client._connection.api_request, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the interface for the retry function is not expected to pass in any arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied it over mainly:
At the moment no, I'm not making it configurable for the first pass and only introducing it to help support/unblock customer friction when using the library.

if retry:
call = retry(call)
return call()


def reload(
self,
client=None,
Expand Down Expand Up @@ -198,13 +208,15 @@ def reload(
if_metageneration_match=if_metageneration_match,
if_metageneration_not_match=if_metageneration_not_match,
)
api_response = client._connection.api_request(
method="GET",
path=self.path,
query_params=query_params,
headers=self._encryption_headers(),
_target_object=self,
timeout=timeout,
api_response = self._call_api(
client,
_DEFAULT_RETRY,
method="GET",
path=self.path,
query_params=query_params,
headers=self._encryption_headers(),
_target_object=self,
timeout=timeout,
)
self._set_properties(api_response)

Expand Down
57 changes: 57 additions & 0 deletions google/cloud/storage/retry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Retry Strategy logic used across google.cloud.storage requests."""

# Should not be in here and only for prototyping
import six
import socket
import requests
import urllib3

from google.api_core import exceptions
from google.api_core import retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, so this is what we told customers to use as a workaround already right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, where are configs around number of attempts or timeouts? Are these values hard-coded into google.api_core.retry?



_RETRYABLE_REASONS = frozenset(
["rateLimitExceeded", "backendError", "internalError", "badGateway", "serviceUnavailable"]
)


_UNSTRUCTURED_RETRYABLE_TYPES = (
exceptions.TooManyRequests,
exceptions.InternalServerError,
exceptions.BadGateway,
exceptions.ServiceUnavailable,
)


def _should_retry(exc):
"""Predicate for determining when to retry."""

if hasattr(exc, "errors"):
if len(exc.errors) == 0:
# Check for unstructured error returns, e.g. from GFE
return isinstance(exc, _UNSTRUCTURED_RETRYABLE_TYPES)
reason = exc.errors[0]["reason"]

return reason in _RETRYABLE_REASONS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not have access to response codes here? Seems kind of a strange workaround to be doing string matching for a 503 for example.

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for else here right? And the nested if statements and uses of isinstance look pretty ugly, I'm sure there's a nicer way to do this.

# Connection Reset
if isinstance(exc, requests.exceptions.ConnectionError):
if isinstance(exc.args[0], urllib3.exceptions.ProtocolError):
if isinstance(exc.args[0].args[1], ConnectionResetError):
return True
return False

_DEFAULT_RETRY = retry.Retry(predicate=_should_retry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be curious to look at other examples of where people have written these predicates.