Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Cancel the Timeout Task for HttpJson #2360

Merged
merged 31 commits into from
Feb 13, 2024
Merged

fix: Cancel the Timeout Task for HttpJson #2360

merged 31 commits into from
Feb 13, 2024

Conversation

lqiu96
Copy link
Contributor

@lqiu96 lqiu96 commented Jan 12, 2024

Fixes: googleapis/google-cloud-java#10220

Currently, the executorService will wait for any previously submitted task to finish execution before being able to terminate. This PR attempts to fix the issue by cancelling the outstanding timeout task. If a response has been received prior to the timeout, the timeout task will be cancelled and the client should be able to terminate immediately afterwards.

This fixes an issue for Clients that have RPCs with a large timeout value (i.e. 10 min). The client would wait 10 minutes before being able to terminate completely.

Local Testing

Running this with SNAPSHOT Gax version (2.41.1-SNAPSHOT) and latest Compute from Libraries-Bom v26.31.0
Logs:

}

	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.api.gax.httpjson.HttpRequestRunnable.run(HttpRequestRunnable.java:115)
	... 6 more

Process finished with exit code 1

Running this with latest Gax version (fix not included) and latest Compute from Libraries-Bom v26.31.0:

  }
}

	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.api.gax.httpjson.HttpRequestRunnable.run(HttpRequestRunnable.java:115)
	... 6 more

Missing the termination and exit code. It shows up after 10 minutes.

@suztomo
Copy link
Member

suztomo commented Jan 12, 2024

Would you try to express the problem googleapis/google-cloud-java#10220 in a test? The test should fail before your fix.

If it's difficult to write the test, would you describe the challenge in the pull request description?

@lqiu96
Copy link
Contributor Author

lqiu96 commented Jan 16, 2024

Would you try to express the problem googleapis/google-cloud-java#10220 in a test? The test should fail before your fix.

If it's difficult to write the test, would you describe the challenge in the pull request description?

Will do, thanks! I think there should be a way to test this behavior.

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: s Pull request size is small. labels Jan 17, 2024
@lqiu96 lqiu96 force-pushed the cancel-timeout-task branch from 7c99d52 to 7f04a7f Compare January 17, 2024 16:08
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Jan 17, 2024
@lqiu96
Copy link
Contributor Author

lqiu96 commented Jan 17, 2024

Added ITClientShutdown showcase tests to test that the client is terminating properly. I expect that the showcase tests will fail if it's using the HttpJsonClientCallsImpl in the main branch.

Showcase (Java 11): https://github.com/googleapis/sdk-platform-java/actions/runs/7560014731/job/20585113208?pr=2360
Showcase (Java 17): https://github.com/googleapis/sdk-platform-java/actions/runs/7560014731/job/20585114645?pr=2360

@lqiu96
Copy link
Contributor Author

lqiu96 commented Jan 17, 2024

Error:

Error:  Errors: 
Error:    ITClientShutdown.testGrpc_rpcInvokedWithLargeTimeout_closeClientOnceResponseReceived:110 » TestTimedOut test timed out after 15000 milliseconds
Error:    ITClientShutdown.testHttpJson_rpcInvokedWithLargeTimeout_closeClientOnceResponseReceived:157 » TestTimedOut test timed out after 15000 milliseconds
[INFO] 
Error:  Tests run: 76, Failures: 0, Errors: 2, Skipped: 1

The testGrpc_rpcInvokedWithLargeTimeout_closeClientOnceResponseReceived test should not be failing. It looks like it's because this test is created with HttpJson client. Will update and re-run.

@lqiu96
Copy link
Contributor Author

lqiu96 commented Jan 17, 2024

Showcase (Java 11): https://github.com/googleapis/sdk-platform-java/actions/runs/7560111797/job/20585419373?pr=2360
Showcase (Java 17): https://github.com/googleapis/sdk-platform-java/actions/runs/7560111797/job/20585419851?pr=2360

I expect both these tests to fail and that testHttpJson_rpcInvokedWithLargeTimeout_closeClientOnceResponseReceived is the only test case that fails.

@lqiu96
Copy link
Contributor Author

lqiu96 commented Jan 29, 2024

Took a look at grpc-java's implementation and they keep track of the ScheduledFuture as well.

Interestingly, grpc-java does not interrupt if the timeout future if it is already running: https://github.com/grpc/grpc-java/blob/7f4c16e068eeb6c7049b7ab2f0420a72d2cc1fb2/core/src/main/java/io/grpc/internal/ClientCallImpl.java#L383.

From the showcase tests, grpc's client isn't blocked and is able to terminate. It seems to be since that client's HttpJson's stub manages that deadlineexecutorservice while that the same deadlineexecutorservice is not managed by client's gRPC's stub (rather managed in grpc-java).

@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 29, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 29, 2024
@lqiu96 lqiu96 marked this pull request as ready for review January 29, 2024 22:01
@lqiu96 lqiu96 requested a review from a team as a code owner January 29, 2024 22:01
@lqiu96 lqiu96 requested review from suztomo and blakeli0 January 29, 2024 22:01
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Feb 7, 2024
@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 7, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 7, 2024
@suztomo
Copy link
Member

suztomo commented Feb 8, 2024

It seems that you were able to reproduce the problem in googleapis/google-cloud-java#10220. In the pull request description, would you add observation where you confirmed the problem of java-compute is fixed?

@lqiu96 lqiu96 requested a review from suztomo February 9, 2024 17:13
@lqiu96
Copy link
Contributor Author

lqiu96 commented Feb 9, 2024

@suztomo and @burkedavison Could I get a re-review and an approval from both of you if everything looks good?

Copy link
Member

@suztomo suztomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Quality Gate Failed Quality Gate failed for 'gapic-generator-java-root'

Failed conditions
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

idea Catch issues before they fail your Quality Gate with our IDE extension SonarLint SonarLint

Copy link

Quality Gate Failed Quality Gate failed for 'java_showcase_integration_tests'

Failed conditions
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

idea Catch issues before they fail your Quality Gate with our IDE extension SonarLint SonarLint

@lqiu96 lqiu96 merged commit b177d9e into main Feb 13, 2024
37 of 39 checks passed
@lqiu96 lqiu96 deleted the cancel-timeout-task branch February 13, 2024 20:51
alicejli pushed a commit that referenced this pull request Feb 13, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.35.0</summary>

##
[2.35.0](v2.34.0...v2.35.0)
(2024-02-13)


### Features

* add `generate_repo.py`
([#2431](#2431))
([47b632a](47b632a))
* move synthtool templates to `library_generation/owlbot`
([#2443](#2443))
([5c95472](5c95472))


### Bug Fixes

* Apiary Host returns user set host if set
([#2455](#2455))
([5f17e62](5f17e62))
* Cancel the Timeout Task for HttpJson
([#2360](#2360))
([b177d9e](b177d9e))


### Dependencies

* update dependency commons-codec:commons-codec to v1.16.1
([#2473](#2473))
([8c6e91d](8c6e91d))
* update google api dependencies
([#2469](#2469))
([ad4d4e6](ad4d4e6))
* update google auth library dependencies to v1.23.0
([#2466](#2466))
([349a5d3](349a5d3))
* update google auth library dependencies to v1.23.0
([#2476](#2476))
([6c9127c](6c9127c))
* update google http client dependencies to v1.44.1
([#2467](#2467))
([87d1435](87d1435))
* update googleapis/java-cloud-bom digest to ac9893c
([#2472](#2472))
([7fff34a](7fff34a))
* update grpc dependencies to v1.61.1
([#2463](#2463))
([9ec575b](9ec575b))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
ddixit14 pushed a commit that referenced this pull request Feb 15, 2024
Fixes: googleapis/google-cloud-java#10220

Currently, the executorService will wait for any previously submitted
task to finish execution before being able to terminate. This PR
attempts to fix the issue by cancelling the outstanding timeout task. If
a response has been received prior to the timeout, the timeout task will
be cancelled and the client should be able to terminate immediately
afterwards.

This fixes an issue for Clients that have RPCs with a large timeout
value (i.e. 10 min). The client would wait 10 minutes before being able
to terminate completely.

## Local Testing
Running this with SNAPSHOT Gax version (2.41.1-SNAPSHOT) and latest
Compute from Libraries-Bom v26.31.0
Logs:
```
}

	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.api.gax.httpjson.HttpRequestRunnable.run(HttpRequestRunnable.java:115)
	... 6 more

Process finished with exit code 1
```

Running this with latest Gax version (fix not included) and latest
Compute from Libraries-Bom v26.31.0:
```
  }
}

	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.api.gax.httpjson.HttpRequestRunnable.run(HttpRequestRunnable.java:115)
	... 6 more
```

Missing the termination and exit code. It shows up after 10 minutes.
ddixit14 pushed a commit that referenced this pull request Feb 15, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.35.0</summary>

##
[2.35.0](v2.34.0...v2.35.0)
(2024-02-13)


### Features

* add `generate_repo.py`
([#2431](#2431))
([47b632a](47b632a))
* move synthtool templates to `library_generation/owlbot`
([#2443](#2443))
([5c95472](5c95472))


### Bug Fixes

* Apiary Host returns user set host if set
([#2455](#2455))
([5f17e62](5f17e62))
* Cancel the Timeout Task for HttpJson
([#2360](#2360))
([b177d9e](b177d9e))


### Dependencies

* update dependency commons-codec:commons-codec to v1.16.1
([#2473](#2473))
([8c6e91d](8c6e91d))
* update google api dependencies
([#2469](#2469))
([ad4d4e6](ad4d4e6))
* update google auth library dependencies to v1.23.0
([#2466](#2466))
([349a5d3](349a5d3))
* update google auth library dependencies to v1.23.0
([#2476](#2476))
([6c9127c](6c9127c))
* update google http client dependencies to v1.44.1
([#2467](#2467))
([87d1435](87d1435))
* update googleapis/java-cloud-bom digest to ac9893c
([#2472](#2472))
([7fff34a](7fff34a))
* update grpc dependencies to v1.61.1
([#2463](#2463))
([9ec575b](9ec575b))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
lqiu96 pushed a commit that referenced this pull request Feb 28, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>2.35.0</summary>

##
[2.35.0](v2.34.0...v2.35.0)
(2024-02-13)


### Features

* add `generate_repo.py`
([#2431](#2431))
([47b632a](47b632a))
* move synthtool templates to `library_generation/owlbot`
([#2443](#2443))
([5c95472](5c95472))


### Bug Fixes

* Apiary Host returns user set host if set
([#2455](#2455))
([5f17e62](5f17e62))
* Cancel the Timeout Task for HttpJson
([#2360](#2360))
([b177d9e](b177d9e))


### Dependencies

* update dependency commons-codec:commons-codec to v1.16.1
([#2473](#2473))
([8c6e91d](8c6e91d))
* update google api dependencies
([#2469](#2469))
([ad4d4e6](ad4d4e6))
* update google auth library dependencies to v1.23.0
([#2466](#2466))
([349a5d3](349a5d3))
* update google auth library dependencies to v1.23.0
([#2476](#2476))
([6c9127c](6c9127c))
* update google http client dependencies to v1.44.1
([#2467](#2467))
([87d1435](87d1435))
* update googleapis/java-cloud-bom digest to ac9893c
([#2472](#2472))
([7fff34a](7fff34a))
* update grpc dependencies to v1.61.1
([#2463](#2463))
([9ec575b](9ec575b))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

google-cloud-compute: ImagesClient does not terminate, and hangs instead
4 participants