-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Major refactoring of Polling, Retry and Timeout logic #462
Conversation
This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress.
Also fix LRO for REST transport. This PR makes generated gapics appeciate timeout values from grpc_service_config.json instead of overriding them with None (which means no timeout) It is basically a direct fix for googleapis#1477. This PR depends on googleapis/python-api-core#462, and expects `setup.py.j2` templates to be updated after googleapis/python-api-core#462 gets pushed and released with new version.
@atulep Addressed your comments PTAL |
@vam-google notification test |
@vam-google test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked carefully at the tests yet, but so far this looks great. I will be running some manual tests to complement what Tony and Aza have been doing.
Most of the comments are documentation tweaks, so hopefully you can just accept them on GitHub and pull them into your local repo before making any other changes.
google/api_core/future/polling.py
Outdated
_OperationNotComplete, | ||
exceptions.TooManyRequests, | ||
exceptions.InternalServerError, | ||
exceptions.BadGateway, | ||
exceptions.ServiceUnavailable, | ||
) | ||
DEFAULT_RETRY = retry.Retry(predicate=RETRY_PREDICATE) | ||
|
||
# DEPRECATED, use DEFAULT_POLLING to configure LRO polling logic. Construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you be explicit about how to use DEFAULT_RETRY? It's not immediately clear how that relates to this sentence.
Also, maybe being more explicit about "baseline".
Maybe something like "Construct the Retry object using DEFAULT_RETRY as a baseline and then modifying specific parameters as needed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The baseliine part was there before. Just keeping it as is, as the current recommendation is simply to not used that at all for anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for being explicit on usage of DEFAULT_RETRY, it is deprecated, so it should not be used.
Co-authored-by: Victor Chudnovsky <vchudnov@google.com>
Co-authored-by: Victor Chudnovsky <vchudnov@google.com>
* fix: Major refactoring and fix for Polling, Retry and Timeout logic This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress. * fix ci failures (mainly sphinx errors) * remove unused code * fix typo * Pin pytest version to <7.2.0 * reformat code * address pr feedback * address PR feedback * address pr feedback * Update google/api_core/future/polling.py Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Apply documentation suggestions from code review Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Address PR feedback Co-authored-by: Victor Chudnovsky <vchudnov@google.com>
…ch (#474) * fix: Major refactoring of Polling, Retry and Timeout logic (#462) * fix: Major refactoring and fix for Polling, Retry and Timeout logic This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementaiton. To properly describe the refactoring this PR does we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overal confusion among both groups: users of the library and creators of the library. Please check the documentation of the `google.api_core.retry.Retry` class the `google.api_core.future.polling.Polling.result()` method for the proper definitions and context. Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was clean as I could make it while still maintaining backward compatibility of the whole library. The quick summary of the changes in this PR: 1) Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the `google.api_core.retry.Retry` class for the actual definitions. Originally the `deadline` has been used to represent timeouts conflating the two concepts. As result this PR replaces `deadline` arguments with `timeout` ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications). 2) Properly define RPC Timeout, Retry Timeout and Pollint Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check `google.api_core.retry.Retry` class documentation for details. 3) Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for `google.api_core.future.polling.PollingFuture.result()` for details. 4) Separate `retry` and `polling` configurations for Polling future, as these are two different concepts (although both operating on `Retry` class). Originally both retry and polling configurations were controlled by a single `retry` parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled. 5) For the following config properties - `Retry (including `Retry Timeout`), `Polling` (including `Polling Timeout`) and `RPC Timeout` - fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (check `PollingFuture.result()` method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values in `grpc_service_config.json` file (for Retry and RPC Timeout) and `gapic.yaml` file (for Polling), or be provided as a hard-coded basic default values in python-api-core library itself. 6) Fix the per-call polling config propagation logic (the polling/retry configs supplied to `PollingFuture.result()` used to be ignored for actual call). 7) Deprecate the usage of `deadline` terminology in the whole library and backward-compatibly replace it with timeout. This is essential as what has been called "deadline" in this library was actually "timeout" as it is defined in `google.api_core.retry.Retry` class documentation. 8) Deprecate `ExponentialTimeout`, `ConstantTimeout` and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it with `TimeToDeadlineTimeout` to be consistent with how the rest of the languages do it. 9) Deprecate `google.api_core.operations_v1.config` as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead. 10) Switch randomized delay calculation from `delay` being treated as expected value for randomized_delay to `delay` being treated as maximum value for `randomized_delay` (i.e. the new expected valud for `randomized_delay` is `delay / 2`). See the `exponential_sleep_generator()` method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth of `delay` value (since it is a subject of exponential growth, the `delay` value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (`inf * number = inf`) binstead of simply overflowing to a (most likely) negative number). 11) Fix url construction in `OperationsRestTransport`. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO). 12) Las but not least: change the default values for Polling logic to be the following: `initial=1.0` (same as before), `maximum=20.0` (was `60`), `multiplier=1.5` (was `2.0`), `timeout=900` (was `120`, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now being `delay / 2`) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user. *The design doc summarising all the changes and reasons for them is in progress. * fix ci failures (mainly sphinx errors) * remove unused code * fix typo * Pin pytest version to <7.2.0 * reformat code * address pr feedback * address PR feedback * address pr feedback * Update google/api_core/future/polling.py Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Apply documentation suggestions from code review Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * Address PR feedback Co-authored-by: Victor Chudnovsky <vchudnov@google.com> * feat: Allow representing enums with their unqualified symbolic names in headers (#465) * feat: Allow non-fully-qualified enums in routing headers * Rename s/fully_qualified_enums/qualified_enums/g for correctness * chore: minor tweaks * chore: Temporary workaround for pytest in noxfile. * Fix import order * bring coverage to 100% * lint * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * remove replacement in owlbot.py causing lint failure Co-authored-by: Anthonios Partheniou <partheniou@google.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> * chore(python): update release script dependencies (#472) * chore(python): drop flake8-import-order in samples noxfile Source-Link: googleapis/synthtool@6ed3a83 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:3abfa0f1886adaf0b83f07cb117b24a639ea1cb9cffe56d43280b977033563eb * drop flake8-import-order * lint * use python 3.9 for docs * resolve mypy error * update python version for lint * fix lint * fix lint Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com> Co-authored-by: Vadym Matsishevskyi <25311427+vam-google@users.noreply.github.com> Co-authored-by: Victor Chudnovsky <vchudnov@google.com> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: gcf-owl-bot[bot] <78513119+gcf-owl-bot[bot]@users.noreply.github.com>
* fix: Fix timeout default values Also fix LRO for REST transport. This PR makes generated gapics appeciate timeout values from grpc_service_config.json instead of overriding them with None (which means no timeout) It is basically a direct fix for #1477. This PR depends on googleapis/python-api-core#462, and expects `setup.py.j2` templates to be updated after googleapis/python-api-core#462 gets pushed and released with new version. * rename uri_prefix to path_prefix to match corresponding python-api-core change * fix unnecessary `gapic_v1.method.DEFAULT` in rest stubs * fix(deps): require google-api-core >=1.34.0 * fix(deps): require google-api-core >=2.11.0 * revert changes to WORKSPACE * fix typo * fix mypy error * revert local change for debugging Co-authored-by: Anthonios Partheniou <partheniou@google.com>
|
||
try: | ||
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: this was a breaking change. #477
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for python-aiplatform: googleapis/python-aiplatform#1870
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that rety logic line never worked (the retry had been i. What broke you is most likely the new default timeout value (instead of None).
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 507613610
fix(components): Limit google-api-core version to avoid timeout introduced in googleapis/python-api-core#462 PiperOrigin-RevId: 508463716
It seems like this merge broke using freezegun, particularly with google-cloud-datastore in unit testing. Here's the stack trace:
Would you expect that to happen? Any workarounds? |
If you are a user of python-api-core and experience polling timing out after ~15 minutes (900s) in your python code after this change, please make sure that instead of calling
PollingFuture.result()
you call it with additonaltimeout
argument like this:PollingFuture.result(timeout = <desired timeout in seconds>)
orPollingFuture.result(timeout = None)
(for infinite timeout, but infinite timeouts are strongly discouraged) for your code to stop timing out at 900s.The core libraries cannot and should not have infitine polling as as default behavior. The fact that it had it like that in python for years (unlike other GCP-supported languages) was a bug, as it contradicts original cross-language LRO methods design in GAPIC libraries and in general users are not supposed to be put into an infinite-loop scenario implicitly. If an LRO can actually ran for hours or even days and it is Ok, then users are expected to acknowledge this fact by providing the huge timeouts explicitly.
This is in response to https://freeman.vc/notes/aws-vs-gcp-reliability-is-wildly-different, which triggered an investigation of the whole Polling/Retry/Timeout behavior in Python GAPIC clients and revealed many fundamental flaws in its implementation.
To properly describe the refactoring in this PR we need to stick to a rigorous terminology, as vague definitions of retries, timeouts, polling and related concepts seems to be the main source of the present bugs and overall confusion among both groups: users of the library and creators of the library. Please check the updated (in this PR) documentation of the
google.api_core.retry.Retry
class and thegoogle.api_core.future.polling.Polling.result()
method for the proper definitions and context.Note, the overall semantics around Polling, Retry and Timeout remains quite confusing even after refactoring (although it is now more or less rigorously defined), but it was as clean as I could make it while still maintaining backward compatibility of the whole library.
The quick summary of the changes in this PR:
Properly define and fix the application of Deadline and Timeout concepts. Please check the updated documentation for the
google.api_core.retry.Retry
class for the actual definitions. Originally thedeadline
has been used to represent timeouts conflating the two concepts. As result this PR replacesdeadline
arguments withtimeout
ones in as backward-compatible manner as possible (i.e. backward compatible in all practical applications).Properly define RPC Timeout, Retry Timeout and Polling Timeout and how a generic Timeout concept (aka Logical Timeout) is mapped to one of those depending on the context. Please check
google.api_core.retry.Retry
class documentation for details.Properly define and fix the application of Retry and Polling concepts. Please check the updated documentation for
google.api_core.future.polling.PollingFuture.result()
for details.Separate
retry
andpolling
configurations for Polling future, as these are two different concepts (although both operating onRetry
class). Originally both retry and polling configurations were controlled by a singleretry
parameter, merging configuration regarding how "rpc error responses" and how "operation not completed" responses are supposed to be handled.For the following config properties -
Retry
(includingRetry Timeout
),Polling
(includingPolling Timeout
) andRPC Timeout
- fix and properly define how each of the above properties gets configured and which config gets precedence in case of a conflict (checkPollingFuture.result()
method documentation for details). Each of those properties can be specified as follows: directly provided by the user for each call, specified during gapic generation time from config values ingrpc_service_config.json
file (for Retry and RPC Timeout) andgapic.yaml
file (for Polling Timeout), or be provided as a hard-coded basic default values in python-api-core library itself. This alo includes fixing the per-call polling config propagation logic (the polling/retry configs supplied toPollingFuture.result()
used to be ignored for actual call).Deprecate
ExponentialTimeout
,ConstantTimeout
and related logic as those are outdated concepts and are not consistent with the other GAPIC Languages. Replace it withTimeToDeadlineTimeout
to be consistent with how the rest of the languages do it.Deprecate
google.api_core.operations_v1.config
as it is an outdated concept and self-inconsistent (as all gapic clients provide configuraiton in code). The configs are directly provided in code instead.Switch randomized delay calculation from
delay
being treated as expected value for randomized_delay todelay
being treated as maximum value forrandomized_delay
(i.e. the new expected valud forrandomized_delay
isdelay / 2
). See theexponential_sleep_generator()
method implementation for details. This is needed to make Python implementation of retries and polling exponential backoff consistent with the rest of GAPIC languages. Also fix the uncontrollable growth ofdelay
value (since it is a subject of exponential growth, thedelay
value was quickly reaching "infinity" value, and the whole thing was not failing simply due to python being a very forgiving language which forgives multiplying "infinity" by a number (inf * number = inf
) binstead of simply overflowing to a (most likely) negative number). Also essentially rollback the 52f12af change, since that is inconsistent with the other languages and damages uniform distibution of retry delays artificially shifting their concentration towards the end of timeout.Fix url construction in
OperationsRestTransport
. Without this fix the polling logic for REST transport was completely broken (is not affecting Compute client, as that one has custom LRO).Last but not least: change the default values for Polling logic to be the following:
initial=1.0
(same as before),maximum=20.0
(was60
),multiplier=1.5
(was2.0
),timeout=900
(was120
, but due to timeout resolution logic was actually None (i.e. infinity)). This, in conjunction with changed calculation of randomized delay (i.e. its expected value now beingdelay / 2
) overall makes polling logic much less aggressive in terms of increasing delays between each polling iteration, making LRO return much earlier for users on average, but still keeping a healthy balance between strain put on both client and server by polling and responsiveness of LROs for user.*The design doc summarising all the changes and reasons for them is in progress.
In addition to the timeout/retry fixes, this PR has some other non-related technical fixes:
Python 3.10
as the default python version used in CI.