Skip to content

Commit

Permalink
fix: Major refactoring of Polling, Retry and Timeout logic in v1 bran…
Browse files Browse the repository at this point in the history
…ch (#474)

* fix: Major refactoring of Polling, Retry and Timeout logic (#462)

* fix: Major refactoring and fix for Polling, Retry and Timeout logic

This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton.

To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context.

Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library.

The quick summary of the changes in this PR:

1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications).

2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details.

3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details.

4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled.

5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself.

6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call).

7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation.

8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it.

9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead.

10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number).

11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO).

12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being  `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user.

*The design doc summarising all the changes and reasons for them is in progress.

* fix ci failures (mainly sphinx errors)

* remove unused code

* fix typo

* Pin pytest version to <7.2.0

* reformat code

* address pr feedback

* address PR feedback

* address pr feedback

* Update google/api_core/future/polling.py

Co-authored-by: Victor Chudnovsky <vchudnov@google.com>

* Apply documentation suggestions from code review

Co-authored-by: Victor Chudnovsky <vchudnov@google.com>

* Address PR feedback

Co-authored-by: Victor Chudnovsky <vchudnov@google.com>

* feat: Allow representing enums with their unqualified symbolic names in headers (#465)

* feat: Allow non-fully-qualified enums in routing headers

* Rename s/fully_qualified_enums/qualified_enums/g for correctness

* chore: minor tweaks

* chore: Temporary workaround for pytest in noxfile.

* Fix import order

* bring coverage to 100%

* lint

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* remove replacement in owlbot.py causing lint failure

Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>

* chore(python): update release script dependencies (#472)

* chore(python): drop flake8-import-order in samples noxfile

Source-Link: googleapis/synthtool@6ed3a83
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:3abfa0f1886adaf0b83f07cb117b24a639ea1cb9cffe56d43280b977033563eb

* drop flake8-import-order

* lint

* use python 3.9 for docs

* resolve mypy error

* update python version for lint

* fix lint

* fix lint

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Anthonios Partheniou <partheniou@google.com>

Co-authored-by: Vadym Matsishevskyi <25311427+vam-google@users.noreply.github.com>
Co-authored-by: Victor Chudnovsky <vchudnov@google.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: gcf-owl-bot[bot] <78513119+gcf-owl-bot[bot]@users.noreply.github.com>
  • Loading branch information
5 people authored Dec 1, 2022
1 parent 4e40562 commit 7d51dbf
Show file tree
Hide file tree
Showing 42 changed files with 883 additions and 463 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
python-version: "3.9"
- name: Install nox
run: |
python -m pip install --upgrade setuptools pip wheel
Expand All @@ -28,7 +28,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
python-version: "3.9"
- name: Install nox
run: |
python -m pip install --upgrade setuptools pip wheel
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.7"
python-version: "3.10"
- name: Install nox
run: |
python -m pip install --upgrade setuptools pip wheel
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mypy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.7"
python-version: "3.10"
- name: Install nox
run: |
python -m pip install --upgrade setuptools pip wheel
Expand Down
12 changes: 6 additions & 6 deletions .kokoro/docker/docs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -60,16 +60,16 @@ RUN apt-get update \
&& rm -rf /var/lib/apt/lists/* \
&& rm -f /var/cache/apt/archives/*.deb

###################### Install python 3.8.11
###################### Install python 3.9.13

# Download python 3.8.11
RUN wget https://www.python.org/ftp/python/3.8.11/Python-3.8.11.tgz
# Download python 3.9.13
RUN wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tgz

# Extract files
RUN tar -xvf Python-3.8.11.tgz
RUN tar -xvf Python-3.9.13.tgz

# Install python 3.8.11
RUN ./Python-3.8.11/configure --enable-optimizations
# Install python 3.9.13
RUN ./Python-3.9.13/configure --enable-optimizations
RUN make altinstall

###################### Install pip
Expand Down
4 changes: 3 additions & 1 deletion .kokoro/requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ typing-extensions
twine
wheel
setuptools
nox
nox
charset-normalizer<3
click<8.1.0
60 changes: 32 additions & 28 deletions .kokoro/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,14 @@ cffi==1.15.1 \
charset-normalizer==2.1.1 \
--hash=sha256:5a3d016c7c547f69d6f81fb0db9449ce888b418b5b9952cc5e6e66843e9dd845 \
--hash=sha256:83e9a75d1911279afd89352c68b45348559d1fc0506b054b346651b5e7fee29f
# via requests
# via
# -r requirements.in
# requests
click==8.0.4 \
--hash=sha256:6a7a62563bbfabfda3a38f3023a1db4a35978c0abd76f6c9605ecd6554d6d9b1 \
--hash=sha256:8458d7b1287c5fb128c90e23381cf99dcde74beaf6c7ff6384ce84d6fe090adb
# via
# -r requirements.in
# gcp-docuploader
# gcp-releasetool
colorlog==6.7.0 \
Expand Down Expand Up @@ -152,19 +155,19 @@ gcp-docuploader==0.6.3 \
--hash=sha256:ba8c9d76b3bbac54b0311c503a373b00edc2dc02d6d54ea9507045adb8e870f7 \
--hash=sha256:c0f5aaa82ce1854a386197e4e359b120ad6d4e57ae2c812fce42219a3288026b
# via -r requirements.in
gcp-releasetool==1.8.7 \
--hash=sha256:3d2a67c9db39322194afb3b427e9cb0476ce8f2a04033695f0aeb63979fc2b37 \
--hash=sha256:5e4d28f66e90780d77f3ecf1e9155852b0c3b13cbccb08ab07e66b2357c8da8d
gcp-releasetool==1.10.0 \
--hash=sha256:72a38ca91b59c24f7e699e9227c90cbe4dd71b789383cb0164b088abae294c83 \
--hash=sha256:8c7c99320208383d4bb2b808c6880eb7a81424afe7cdba3c8d84b25f4f0e097d
# via -r requirements.in
google-api-core==2.8.2 \
--hash=sha256:06f7244c640322b508b125903bb5701bebabce8832f85aba9335ec00b3d02edc \
--hash=sha256:93c6a91ccac79079ac6bbf8b74ee75db970cc899278b97d53bc012f35908cf50
# via
# google-cloud-core
# google-cloud-storage
google-auth==2.11.0 \
--hash=sha256:be62acaae38d0049c21ca90f27a23847245c9f161ff54ede13af2cb6afecbac9 \
--hash=sha256:ed65ecf9f681832298e29328e1ef0a3676e3732b2e56f41532d45f70a22de0fb
google-auth==2.14.1 \
--hash=sha256:ccaa901f31ad5cbb562615eb8b664b3dd0bf5404a67618e642307f00613eda4d \
--hash=sha256:f5d8701633bebc12e0deea4df8abd8aff31c28b355360597f7f2ee60f2e4d016
# via
# gcp-releasetool
# google-api-core
Expand All @@ -174,9 +177,9 @@ google-cloud-core==2.3.2 \
--hash=sha256:8417acf6466be2fa85123441696c4badda48db314c607cf1e5d543fa8bdc22fe \
--hash=sha256:b9529ee7047fd8d4bf4a2182de619154240df17fbe60ead399078c1ae152af9a
# via google-cloud-storage
google-cloud-storage==2.5.0 \
--hash=sha256:19a26c66c317ce542cea0830b7e787e8dac2588b6bfa4d3fd3b871ba16305ab0 \
--hash=sha256:382f34b91de2212e3c2e7b40ec079d27ee2e3dbbae99b75b1bcd8c63063ce235
google-cloud-storage==2.6.0 \
--hash=sha256:104ca28ae61243b637f2f01455cc8a05e8f15a2a18ced96cb587241cdd3820f5 \
--hash=sha256:4ad0415ff61abdd8bb2ae81c1f8f7ec7d91a1011613f2db87c614c550f97bfe9
# via gcp-docuploader
google-crc32c==1.3.0 \
--hash=sha256:04e7c220798a72fd0f08242bc8d7a05986b2a08a0573396187fd32c1dcdd58b3 \
Expand Down Expand Up @@ -227,9 +230,9 @@ google-resumable-media==2.3.3 \
--hash=sha256:27c52620bd364d1c8116eaac4ea2afcbfb81ae9139fb3199652fcac1724bfb6c \
--hash=sha256:5b52774ea7a829a8cdaa8bd2d4c3d4bc660c91b30857ab2668d0eb830f4ea8c5
# via google-cloud-storage
googleapis-common-protos==1.56.4 \
--hash=sha256:8eb2cbc91b69feaf23e32452a7ae60e791e09967d81d4fcc7fc388182d1bd394 \
--hash=sha256:c25873c47279387cfdcbdafa36149887901d36202cb645a0e4f29686bf6e4417
googleapis-common-protos==1.57.0 \
--hash=sha256:27a849d6205838fb6cc3c1c21cb9800707a661bb21c6ce7fb13e99eb1f8a0c46 \
--hash=sha256:a9f4a1d7f6d9809657b7f1316a1aa527f6664891531bcfcc13b6696e685f443c
# via google-api-core
idna==3.3 \
--hash=sha256:84d9dd047ffa80596e0f246e2eab0b391788b0503584e8945f2368256d2735ff \
Expand All @@ -240,6 +243,7 @@ importlib-metadata==4.12.0 \
--hash=sha256:7401a975809ea1fdc658c3aa4f78cc2195a0e019c5cbc4c06122884e9ae80c23
# via
# -r requirements.in
# keyring
# twine
jaraco-classes==3.2.2 \
--hash=sha256:6745f113b0b588239ceb49532aa09c3ebb947433ce311ef2f8e3ad64ebb74594 \
Expand All @@ -255,9 +259,9 @@ jinja2==3.1.2 \
--hash=sha256:31351a702a408a9e7595a8fc6150fc3f43bb6bf7e319770cbc0db9df9437e852 \
--hash=sha256:6088930bfe239f0e6710546ab9c19c9ef35e29792895fed6e6e31a023a182a61
# via gcp-releasetool
keyring==23.9.0 \
--hash=sha256:4c32a31174faaee48f43a7e2c7e9c3216ec5e95acf22a2bebfb4a1d05056ee44 \
--hash=sha256:98f060ec95ada2ab910c195a2d4317be6ef87936a766b239c46aa3c7aac4f0db
keyring==23.11.0 \
--hash=sha256:3dd30011d555f1345dec2c262f0153f2f0ca6bca041fb1dc4588349bb4c0ac1e \
--hash=sha256:ad192263e2cdd5f12875dedc2da13534359a7e760e77f8d04b50968a821c2361
# via
# gcp-releasetool
# twine
Expand Down Expand Up @@ -321,9 +325,9 @@ pkginfo==1.8.3 \
--hash=sha256:848865108ec99d4901b2f7e84058b6e7660aae8ae10164e015a6dcf5b242a594 \
--hash=sha256:a84da4318dd86f870a9447a8c98340aa06216bfc6f2b7bdc4b8766984ae1867c
# via twine
platformdirs==2.5.2 \
--hash=sha256:027d8e83a2d7de06bbac4e5ef7e023c02b863d7ea5d079477e722bb41ab25788 \
--hash=sha256:58c8abb07dcb441e6ee4b11d8df0ac856038f944ab98b7be6b27b2a3c7feef19
platformdirs==2.5.4 \
--hash=sha256:1006647646d80f16130f052404c6b901e80ee4ed6bef6792e1f238a8969106f7 \
--hash=sha256:af0276409f9a02373d540bf8480021a048711d572745aef4b7842dad245eba10
# via virtualenv
protobuf==3.20.1 \
--hash=sha256:06059eb6953ff01e56a25cd02cca1a9649a75a7e65397b5b9b4e929ed71d10cf \
Expand Down Expand Up @@ -448,25 +452,25 @@ urllib3==1.26.12 \
# via
# requests
# twine
virtualenv==20.16.4 \
--hash=sha256:014f766e4134d0008dcaa1f95bafa0fb0f575795d07cae50b1bee514185d6782 \
--hash=sha256:035ed57acce4ac35c82c9d8802202b0e71adac011a511ff650cbcf9635006a22
virtualenv==20.16.7 \
--hash=sha256:8691e3ff9387f743e00f6bb20f70121f5e4f596cae754531f2b3b3a1b1ac696e \
--hash=sha256:efd66b00386fdb7dbe4822d172303f40cd05e50e01740b19ea42425cbe653e29
# via nox
webencodings==0.5.1 \
--hash=sha256:a0af1213f3c2226497a97e2b3aa01a7e4bee4f403f95be16fc9acd2947514a78 \
--hash=sha256:b36a1c245f2d304965eb4e0a82848379241dc04b865afcc4aab16748587e1923
# via bleach
wheel==0.37.1 \
--hash=sha256:4bdcd7d840138086126cd09254dc6195fb4fc6f01c050a1d7236f2630db1d22a \
--hash=sha256:e9a504e793efbca1b8e0e9cb979a249cf4a0a7b5b8c9e8b65a5e39d49529c1c4
wheel==0.38.4 \
--hash=sha256:965f5259b566725405b05e7cf774052044b1ed30119b5d586b2703aafe8719ac \
--hash=sha256:b60533f3f5d530e971d6737ca6d58681ee434818fab630c83a734bb10c083ce8
# via -r requirements.in
zipp==3.8.1 \
--hash=sha256:05b45f1ee8f807d0cc928485ca40a07cb491cf092ff587c0df9cb1fd154848d2 \
--hash=sha256:47c40d7fe183a6f21403a199b3e4192cca5774656965b0a4988ad2f8feb5f009
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
setuptools==65.2.0 \
--hash=sha256:7f4bc85450898a09f76ebf28b72fa25bc7111f6c7d665d514a60bba9c75ef2a9 \
--hash=sha256:a3ca5857c89f82f5c9410e8508cb32f4872a3bafd4aa7ae122a24ca33bccc750
setuptools==65.5.1 \
--hash=sha256:d0b9a8433464d5800cbe05094acf5c6d52a91bfac9b52bcfc4d41382be5d5d31 \
--hash=sha256:e197a19aa8ec9722928f2206f8de752def0e4c9fc6953527360d1c36d94ddb2f
# via -r requirements.in
28 changes: 19 additions & 9 deletions google/api_core/extended_operation.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,13 @@ class ExtendedOperation(polling.PollingFuture):
refresh (Callable[[], type(extended_operation)]): A callable that returns
the latest state of the operation.
cancel (Callable[[], None]): A callable that tries to cancel the operation.
retry: Optional(google.api_core.retry.Retry): The retry configuration used
when polling. This can be used to control how often :meth:`done`
is polled. Regardless of the retry's ``deadline``, it will be
overridden by the ``timeout`` argument to :meth:`result`.
polling Optional(google.api_core.retry.Retry): The configuration used
for polling. This can be used to control how often :meth:`done`
is polled. If the ``timeout`` argument to :meth:`result` is
specified it will override the ``polling.timeout`` property.
retry Optional(google.api_core.retry.Retry): DEPRECATED use ``polling``
instead. If specified it will override ``polling`` parameter to
maintain backward compatibility.
Note: Most long-running API methods use google.api_core.operation.Operation
This class is a wrapper for a subset of methods that use alternative
Expand All @@ -68,9 +71,14 @@ class ExtendedOperation(polling.PollingFuture):
"""

def __init__(
self, extended_operation, refresh, cancel, retry=polling.DEFAULT_RETRY
self,
extended_operation,
refresh,
cancel,
polling=polling.DEFAULT_POLLING,
**kwargs,
):
super().__init__(retry=retry)
super().__init__(polling=polling, **kwargs)
self._extended_operation = extended_operation
self._refresh = refresh
self._cancel = cancel
Expand Down Expand Up @@ -114,7 +122,7 @@ def error_message(self):
def __getattr__(self, name):
return getattr(self._extended_operation, name)

def done(self, retry=polling.DEFAULT_RETRY):
def done(self, retry=None):
self._refresh_and_update(retry)
return self._extended_operation.done

Expand All @@ -137,9 +145,11 @@ def cancelled(self):
self._refresh_and_update()
return self._extended_operation.done

def _refresh_and_update(self, retry=polling.DEFAULT_RETRY):
def _refresh_and_update(self, retry=None):
if not self._extended_operation.done:
self._extended_operation = self._refresh(retry=retry)
self._extended_operation = (
self._refresh(retry=retry) if retry else self._refresh()
)
self._handle_refreshed_operation()

def _handle_refreshed_operation(self):
Expand Down
2 changes: 1 addition & 1 deletion google/api_core/future/async_future.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ async def _blocking_poll(self, timeout=None):
if self._future.done():
return

retry_ = self._retry.with_deadline(timeout)
retry_ = self._retry.with_timeout(timeout)

try:
await retry_(self._done_or_raise)()
Expand Down
Loading

0 comments on commit 7d51dbf

Please sign in to comment.