-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: first attempt should use the min of RPC timeout and total timeout #2641
Conversation
gax-java/gax/src/main/java/com/google/api/gax/retrying/ExponentialRetryAlgorithm.java
Outdated
Show resolved
Hide resolved
drive-by comment - I think this change intuitively makes sense, but we should make sure we document it properly as RetrySettings have caused a lot of confusion in the past. @lqiu96 has been working on a guide here: googleapis/google-cloud-java#10598 |
Thanks @mutianf. Can you create an issue to describe the issue that you and BigTable are facing so that we can have a bit more context? |
totalTimeout == 0 | ||
? globalSettings.getInitialRpcTimeout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this check is needed because it's possible that totalTimeout is not set (or set to 0) and it's being limited by number of attempts? If that's correct, could you add a small comment above to detail this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Also created a issue #2643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, do you know if there is a valid use case where the total timeout is non-zero (explicitly set by the user), but the the total timeout is set to be less than the initialRPCTimeout? The only thing that I can think of right now is that a user has both time and attempt bounds and they care more about the attempt bounds (maybe?). Even if that is the case, the initialRPCTimeout shouldn't be larger than the totalTimeout.
After some more thought, I'm slightly leaning towards having a check that ensures that the initialRPCTimeout is less than or equal to the total timeout if totalTimeout is set. It should eliminate the need for a check here. Something here:
sdk-platform-java/gax-java/gax/src/main/java/com/google/api/gax/retrying/RetrySettings.java
Lines 429 to 452 in e542af8
if (params.getTotalTimeout().toMillis() < 0) { | |
throw new IllegalStateException("total timeout must not be negative"); | |
} | |
if (params.getInitialRetryDelay().toMillis() < 0) { | |
throw new IllegalStateException("initial retry delay must not be negative"); | |
} | |
if (params.getRetryDelayMultiplier() < 1.0) { | |
throw new IllegalStateException("retry delay multiplier must be at least 1"); | |
} | |
if (params.getMaxRetryDelay().compareTo(params.getInitialRetryDelay()) < 0) { | |
throw new IllegalStateException("max retry delay must not be shorter than initial delay"); | |
} | |
if (params.getMaxAttempts() < 0) { | |
throw new IllegalStateException("max attempts must be non-negative"); | |
} | |
if (params.getInitialRpcTimeout().toMillis() < 0) { | |
throw new IllegalStateException("initial rpc timeout must not be negative"); | |
} | |
if (params.getMaxRpcTimeout().compareTo(params.getInitialRpcTimeout()) < 0) { | |
throw new IllegalStateException("max rpc timeout must not be shorter than initial timeout"); | |
} | |
if (params.getRpcTimeoutMultiplier() < 1.0) { | |
throw new IllegalStateException("rpc timeout multiplier must be at least 1"); | |
} |
CC: @blakeli0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have a customer that only sets total timeout to be 400ms. The default initialRpcTimeout in bigtable client is 20 seconds. I think adding the check here would cause the breakage for the customer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I'm guessing they also either override the initialRpcTimeout or they're fine that call only runs a max of 400ms. I don't know the full context and there may be a use case where intialRpcTimeout > totalTimeout. Either way, I think you're right and we can't make that change. I think this solution is fine with me then.
/gcbrun |
1 similar comment
/gcbrun |
gax-java/gax/src/main/java/com/google/api/gax/retrying/ExponentialRetryAlgorithm.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix makes sense but is also technically a change of behavior. If a customer has initialRpcTimeout
set larger than totalTimeout
, and has been incorrectly relying on this behavior, then they would have to change their code.
That being said, I think we should fix it but we should add some Javadoc to RetrySetting in relevant sections for totalTimeout
and initialRpcTimeout
. In case someone complains(hopefully not), we can at least point them to the relevant docs.
gax-java/gax/src/main/java/com/google/api/gax/retrying/ExponentialRetryAlgorithm.java
Outdated
Show resolved
Hide resolved
I think having |
/gcbrun |
/gcbrun |
Quality Gate failed for 'gapic-generator-java-root'Failed conditions |
Quality Gate passed for 'java_showcase_integration_tests'Issues Measures |
🤖 I have created a release *beep* *boop* --- <details><summary>2.39.0</summary> ## [2.39.0](v2.38.1...v2.39.0) (2024-04-18) ### Features * add `libraries_bom_version` to generation configuration ([#2639](#2639)) ([56c7ca5](56c7ca5)) * Add ChannelPoolSettings Getter for gRPC's ChannelProvider ([#2612](#2612)) ([d0c5191](d0c5191)) * add config change ([#2604](#2604)) ([8312706](8312706)) * add entry point ([#2616](#2616)) ([b19fa33](b19fa33)) * add generation config comparator ([#2587](#2587)) ([a94c2f0](a94c2f0)) * Add JavadocJar Task to build.gradle for self service libraries ([#2593](#2593)) ([993f5ac](993f5ac)) * Client/StubSettings' getEndpoint() returns the resolved endpoint ([#2440](#2440)) ([4942bc1](4942bc1)) * generate selected libraries ([#2598](#2598)) ([739ddbb](739ddbb)) * Validate the Universe Domain inside Java-Core ([#2592](#2592)) ([35d789f](35d789f)) ### Bug Fixes * add main to `generate_repo.py` ([#2607](#2607)) ([fedeb32](fedeb32)) * correct deep-remove and deep-preserve regexes ([#2572](#2572)) ([4c7fd88](4c7fd88)) * first attempt should use the min of RPC timeout and total timeout ([#2641](#2641)) ([0349232](0349232)) * remove duplicated calls to AutoValue builders ([#2636](#2636)) ([53a3727](53a3727)) * remove unnecessary slf4j and AbstractGoogleClientRequest native image configs ([0cb7d0e](0cb7d0e)) * remove unnecessary slf4j and AbstractGoogleClientRequest native image configs ([#2628](#2628)) ([0cb7d0e](0cb7d0e)) ### Dependencies * update arrow.version to v15.0.2 ([#2589](#2589)) ([777acf3](777acf3)) * update dependency com.google.cloud.opentelemetry:detector-resources-support to v0.28.0 ([#2649](#2649)) ([e4ed176](e4ed176)) * update dependency gitpython to v3.1.41 [security] ([#2625](#2625)) ([e41bd8f](e41bd8f)) * update dependency net.bytebuddy:byte-buddy to v1.14.13 ([#2646](#2646)) ([73ac5a4](73ac5a4)) * update dependency org.threeten:threeten-extra to v1.8.0 ([#2650](#2650)) ([226325a](226325a)) * update dependency org.threeten:threetenbp to v1.6.9 ([#2602](#2602)) ([371753e](371753e)) * update dependency org.threeten:threetenbp to v1.6.9 ([#2665](#2665)) ([8935bc8](8935bc8)) * update google api dependencies ([#2584](#2584)) ([cd20604](cd20604)) * update googleapis/java-cloud-bom digest to 7071341 ([#2608](#2608)) ([8d74140](8d74140)) * update netty dependencies to v4.1.109.final ([#2597](#2597)) ([8990693](8990693)) * update opentelemetry-java monorepo to v1.37.0 ([#2652](#2652)) ([f8fa2e9](f8fa2e9)) * update protobuf dependencies to v3.25.3 ([#2491](#2491)) ([b0e5041](b0e5041)) * update slf4j monorepo to v2.0.13 ([#2647](#2647)) ([f030e29](f030e29)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- <details><summary>2.39.0</summary> ## [2.39.0](v2.38.1...v2.39.0) (2024-04-18) ### Features * add `libraries_bom_version` to generation configuration ([#2639](#2639)) ([56c7ca5](56c7ca5)) * Add ChannelPoolSettings Getter for gRPC's ChannelProvider ([#2612](#2612)) ([d0c5191](d0c5191)) * add config change ([#2604](#2604)) ([8312706](8312706)) * add entry point ([#2616](#2616)) ([b19fa33](b19fa33)) * add generation config comparator ([#2587](#2587)) ([a94c2f0](a94c2f0)) * Add JavadocJar Task to build.gradle for self service libraries ([#2593](#2593)) ([993f5ac](993f5ac)) * Client/StubSettings' getEndpoint() returns the resolved endpoint ([#2440](#2440)) ([4942bc1](4942bc1)) * generate selected libraries ([#2598](#2598)) ([739ddbb](739ddbb)) * Validate the Universe Domain inside Java-Core ([#2592](#2592)) ([35d789f](35d789f)) ### Bug Fixes * add main to `generate_repo.py` ([#2607](#2607)) ([fedeb32](fedeb32)) * correct deep-remove and deep-preserve regexes ([#2572](#2572)) ([4c7fd88](4c7fd88)) * first attempt should use the min of RPC timeout and total timeout ([#2641](#2641)) ([0349232](0349232)) * remove duplicated calls to AutoValue builders ([#2636](#2636)) ([53a3727](53a3727)) * remove unnecessary slf4j and AbstractGoogleClientRequest native image configs ([0cb7d0e](0cb7d0e)) * remove unnecessary slf4j and AbstractGoogleClientRequest native image configs ([#2628](#2628)) ([0cb7d0e](0cb7d0e)) ### Dependencies * update arrow.version to v15.0.2 ([#2589](#2589)) ([777acf3](777acf3)) * update dependency com.google.cloud.opentelemetry:detector-resources-support to v0.28.0 ([#2649](#2649)) ([e4ed176](e4ed176)) * update dependency gitpython to v3.1.41 [security] ([#2625](#2625)) ([e41bd8f](e41bd8f)) * update dependency net.bytebuddy:byte-buddy to v1.14.13 ([#2646](#2646)) ([73ac5a4](73ac5a4)) * update dependency org.threeten:threeten-extra to v1.8.0 ([#2650](#2650)) ([226325a](226325a)) * update dependency org.threeten:threetenbp to v1.6.9 ([#2602](#2602)) ([371753e](371753e)) * update dependency org.threeten:threetenbp to v1.6.9 ([#2665](#2665)) ([8935bc8](8935bc8)) * update google api dependencies ([#2584](#2584)) ([cd20604](cd20604)) * update googleapis/java-cloud-bom digest to 7071341 ([#2608](#2608)) ([8d74140](8d74140)) * update netty dependencies to v4.1.109.final ([#2597](#2597)) ([8990693](8990693)) * update opentelemetry-java monorepo to v1.37.0 ([#2652](#2652)) ([f8fa2e9](f8fa2e9)) * update protobuf dependencies to v3.25.3 ([#2491](#2491)) ([b0e5041](b0e5041)) * update slf4j monorepo to v2.0.13 ([#2647](#2647)) ([f030e29](f030e29)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Thank you for opening a Pull Request! For general contributing guidelines, please refer to contributing guide
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2643 ☕️