Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: first attempt should use the min of RPC timeout and total timeout #2641

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

mutianf
Copy link
Contributor

@mutianf mutianf commented Apr 15, 2024

Thank you for opening a Pull Request! For general contributing guidelines, please refer to contributing guide

Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #2643 ☕️

@mutianf mutianf requested a review from a team as a code owner April 15, 2024 20:07
@product-auto-label product-auto-label bot added the size: s Pull request size is small. label Apr 15, 2024
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Apr 16, 2024
@alicejli
Copy link
Contributor

drive-by comment - I think this change intuitively makes sense, but we should make sure we document it properly as RetrySettings have caused a lot of confusion in the past. @lqiu96 has been working on a guide here: googleapis/google-cloud-java#10598

@lqiu96
Copy link
Contributor

lqiu96 commented Apr 16, 2024

Thanks @mutianf. Can you create an issue to describe the issue that you and BigTable are facing so that we can have a bit more context?

Comment on lines 72 to 73
totalTimeout == 0
? globalSettings.getInitialRpcTimeout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this check is needed because it's possible that totalTimeout is not set (or set to 0) and it's being limited by number of attempts? If that's correct, could you add a small comment above to detail this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Also created a issue #2643

Copy link
Contributor

@lqiu96 lqiu96 Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, do you know if there is a valid use case where the total timeout is non-zero (explicitly set by the user), but the the total timeout is set to be less than the initialRPCTimeout? The only thing that I can think of right now is that a user has both time and attempt bounds and they care more about the attempt bounds (maybe?). Even if that is the case, the initialRPCTimeout shouldn't be larger than the totalTimeout.

After some more thought, I'm slightly leaning towards having a check that ensures that the initialRPCTimeout is less than or equal to the total timeout if totalTimeout is set. It should eliminate the need for a check here. Something here:

if (params.getTotalTimeout().toMillis() < 0) {
throw new IllegalStateException("total timeout must not be negative");
}
if (params.getInitialRetryDelay().toMillis() < 0) {
throw new IllegalStateException("initial retry delay must not be negative");
}
if (params.getRetryDelayMultiplier() < 1.0) {
throw new IllegalStateException("retry delay multiplier must be at least 1");
}
if (params.getMaxRetryDelay().compareTo(params.getInitialRetryDelay()) < 0) {
throw new IllegalStateException("max retry delay must not be shorter than initial delay");
}
if (params.getMaxAttempts() < 0) {
throw new IllegalStateException("max attempts must be non-negative");
}
if (params.getInitialRpcTimeout().toMillis() < 0) {
throw new IllegalStateException("initial rpc timeout must not be negative");
}
if (params.getMaxRpcTimeout().compareTo(params.getInitialRpcTimeout()) < 0) {
throw new IllegalStateException("max rpc timeout must not be shorter than initial timeout");
}
if (params.getRpcTimeoutMultiplier() < 1.0) {
throw new IllegalStateException("rpc timeout multiplier must be at least 1");
}
, but that may be a behavior breaking change.

CC: @blakeli0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have a customer that only sets total timeout to be 400ms. The default initialRpcTimeout in bigtable client is 20 seconds. I think adding the check here would cause the breakage for the customer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'm guessing they also either override the initialRpcTimeout or they're fine that call only runs a max of 400ms. I don't know the full context and there may be a use case where intialRpcTimeout > totalTimeout. Either way, I think you're right and we can't make that change. I think this solution is fine with me then.

@mutianf mutianf changed the title fix: first attempt should always use the min of RPC timeout and total… fix: first attempt should use the min of RPC timeout and total timeout Apr 16, 2024
@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 16, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 16, 2024
@lqiu96
Copy link
Contributor

lqiu96 commented Apr 16, 2024

/gcbrun

1 similar comment
@lqiu96
Copy link
Contributor

lqiu96 commented Apr 16, 2024

/gcbrun

@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 16, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 16, 2024
Copy link
Contributor

@lqiu96 lqiu96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mutianf Changes LGTM and I think make sense. Please let @blakeli0 take a quick look and approve before merging. Thanks!

Copy link
Collaborator

@blakeli0 blakeli0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix makes sense but is also technically a change of behavior. If a customer has initialRpcTimeout set larger than totalTimeout, and has been incorrectly relying on this behavior, then they would have to change their code.
That being said, I think we should fix it but we should add some Javadoc to RetrySetting in relevant sections for totalTimeout and initialRpcTimeout. In case someone complains(hopefully not), we can at least point them to the relevant docs.

@mutianf
Copy link
Contributor Author

mutianf commented Apr 18, 2024

This fix makes sense but is also technically a change of behavior. If a customer has initialRpcTimeout set larger than totalTimeout, and has been incorrectly relying on this behavior, then they would have to change their code. That being said, I think we should fix it but we should add some Javadoc to RetrySetting in relevant sections for totalTimeout and initialRpcTimeout. In case someone complains(hopefully not), we can at least point them to the relevant docs.

I think having initialRpcTimeout > totalTimeout is not really semantically meaningful. Since totalTimeout caps how long the entire operation will take. I'll add javadoc to RettrySettings.

@lqiu96
Copy link
Contributor

lqiu96 commented Apr 18, 2024

/gcbrun

@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 18, 2024
@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 18, 2024
@lqiu96
Copy link
Contributor

lqiu96 commented Apr 18, 2024

/gcbrun

Copy link

Quality Gate Failed Quality Gate failed for 'gapic-generator-java-root'

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Copy link

Quality Gate Passed Quality Gate passed for 'java_showcase_integration_tests'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

@lqiu96 lqiu96 merged commit 0349232 into googleapis:main Apr 18, 2024
35 of 36 checks passed
@mutianf mutianf deleted the timeout branch April 18, 2024 18:52
lqiu96 pushed a commit that referenced this pull request Apr 18, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.39.0</summary>

##
[2.39.0](v2.38.1...v2.39.0)
(2024-04-18)


### Features

* add `libraries_bom_version` to generation configuration
([#2639](#2639))
([56c7ca5](56c7ca5))
* Add ChannelPoolSettings Getter for gRPC's ChannelProvider
([#2612](#2612))
([d0c5191](d0c5191))
* add config change
([#2604](#2604))
([8312706](8312706))
* add entry point
([#2616](#2616))
([b19fa33](b19fa33))
* add generation config comparator
([#2587](#2587))
([a94c2f0](a94c2f0))
* Add JavadocJar Task to build.gradle for self service libraries
([#2593](#2593))
([993f5ac](993f5ac))
* Client/StubSettings' getEndpoint() returns the resolved endpoint
([#2440](#2440))
([4942bc1](4942bc1))
* generate selected libraries
([#2598](#2598))
([739ddbb](739ddbb))
* Validate the Universe Domain inside Java-Core
([#2592](#2592))
([35d789f](35d789f))


### Bug Fixes

* add main to `generate_repo.py`
([#2607](#2607))
([fedeb32](fedeb32))
* correct deep-remove and deep-preserve regexes
([#2572](#2572))
([4c7fd88](4c7fd88))
* first attempt should use the min of RPC timeout and total timeout
([#2641](#2641))
([0349232](0349232))
* remove duplicated calls to AutoValue builders
([#2636](#2636))
([53a3727](53a3727))
* remove unnecessary slf4j and AbstractGoogleClientRequest native image
configs
([0cb7d0e](0cb7d0e))
* remove unnecessary slf4j and AbstractGoogleClientRequest native image
configs
([#2628](#2628))
([0cb7d0e](0cb7d0e))


### Dependencies

* update arrow.version to v15.0.2
([#2589](#2589))
([777acf3](777acf3))
* update dependency
com.google.cloud.opentelemetry:detector-resources-support to v0.28.0
([#2649](#2649))
([e4ed176](e4ed176))
* update dependency gitpython to v3.1.41 [security]
([#2625](#2625))
([e41bd8f](e41bd8f))
* update dependency net.bytebuddy:byte-buddy to v1.14.13
([#2646](#2646))
([73ac5a4](73ac5a4))
* update dependency org.threeten:threeten-extra to v1.8.0
([#2650](#2650))
([226325a](226325a))
* update dependency org.threeten:threetenbp to v1.6.9
([#2602](#2602))
([371753e](371753e))
* update dependency org.threeten:threetenbp to v1.6.9
([#2665](#2665))
([8935bc8](8935bc8))
* update google api dependencies
([#2584](#2584))
([cd20604](cd20604))
* update googleapis/java-cloud-bom digest to 7071341
([#2608](#2608))
([8d74140](8d74140))
* update netty dependencies to v4.1.109.final
([#2597](#2597))
([8990693](8990693))
* update opentelemetry-java monorepo to v1.37.0
([#2652](#2652))
([f8fa2e9](f8fa2e9))
* update protobuf dependencies to v3.25.3
([#2491](#2491))
([b0e5041](b0e5041))
* update slf4j monorepo to v2.0.13
([#2647](#2647))
([f030e29](f030e29))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
lqiu96 pushed a commit that referenced this pull request May 22, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.39.0</summary>

##
[2.39.0](v2.38.1...v2.39.0)
(2024-04-18)


### Features

* add `libraries_bom_version` to generation configuration
([#2639](#2639))
([56c7ca5](56c7ca5))
* Add ChannelPoolSettings Getter for gRPC's ChannelProvider
([#2612](#2612))
([d0c5191](d0c5191))
* add config change
([#2604](#2604))
([8312706](8312706))
* add entry point
([#2616](#2616))
([b19fa33](b19fa33))
* add generation config comparator
([#2587](#2587))
([a94c2f0](a94c2f0))
* Add JavadocJar Task to build.gradle for self service libraries
([#2593](#2593))
([993f5ac](993f5ac))
* Client/StubSettings' getEndpoint() returns the resolved endpoint
([#2440](#2440))
([4942bc1](4942bc1))
* generate selected libraries
([#2598](#2598))
([739ddbb](739ddbb))
* Validate the Universe Domain inside Java-Core
([#2592](#2592))
([35d789f](35d789f))


### Bug Fixes

* add main to `generate_repo.py`
([#2607](#2607))
([fedeb32](fedeb32))
* correct deep-remove and deep-preserve regexes
([#2572](#2572))
([4c7fd88](4c7fd88))
* first attempt should use the min of RPC timeout and total timeout
([#2641](#2641))
([0349232](0349232))
* remove duplicated calls to AutoValue builders
([#2636](#2636))
([53a3727](53a3727))
* remove unnecessary slf4j and AbstractGoogleClientRequest native image
configs
([0cb7d0e](0cb7d0e))
* remove unnecessary slf4j and AbstractGoogleClientRequest native image
configs
([#2628](#2628))
([0cb7d0e](0cb7d0e))


### Dependencies

* update arrow.version to v15.0.2
([#2589](#2589))
([777acf3](777acf3))
* update dependency
com.google.cloud.opentelemetry:detector-resources-support to v0.28.0
([#2649](#2649))
([e4ed176](e4ed176))
* update dependency gitpython to v3.1.41 [security]
([#2625](#2625))
([e41bd8f](e41bd8f))
* update dependency net.bytebuddy:byte-buddy to v1.14.13
([#2646](#2646))
([73ac5a4](73ac5a4))
* update dependency org.threeten:threeten-extra to v1.8.0
([#2650](#2650))
([226325a](226325a))
* update dependency org.threeten:threetenbp to v1.6.9
([#2602](#2602))
([371753e](371753e))
* update dependency org.threeten:threetenbp to v1.6.9
([#2665](#2665))
([8935bc8](8935bc8))
* update google api dependencies
([#2584](#2584))
([cd20604](cd20604))
* update googleapis/java-cloud-bom digest to 7071341
([#2608](#2608))
([8d74140](8d74140))
* update netty dependencies to v4.1.109.final
([#2597](#2597))
([8990693](8990693))
* update opentelemetry-java monorepo to v1.37.0
([#2652](#2652))
([f8fa2e9](f8fa2e9))
* update protobuf dependencies to v3.25.3
([#2491](#2491))
([b0e5041](b0e5041))
* update slf4j monorepo to v2.0.13
([#2647](#2647))
([f030e29](f030e29))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExponentialRetryAlgorithm should use the min of initialRpc and total timeout on the first attempt
5 participants