You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The call of blockingUnaryCall should not block forever
What did you see instead?
Thread is hanging even if use a deadline.
jstack with problem
"main" #1 [8707] prio=5 os_prio=31 cpu=207.44ms elapsed=16.32s tid=0x0000000132808200 nid=8707 waiting on condition [0x000000016fcda000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)
- parking to wait for <0x000000061e400010> (a io.grpc.stub.ClientCalls$ThreadlessExecutor)
at java.util.concurrent.locks.LockSupport.park(java.base@21.0.1/LockSupport.java:221)
at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:717)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:159)
at com.example.ExampleServiceGrpc$ExampleServiceBlockingStub.unaryCall(ExampleServiceGrpc.java:160)
at com.example.TestClient.call(TestClient.java:70)
at com.example.TestClient.callWithError(TestClient.java:53)
at com.example.Main.main(Main.java:52)
Steps to reproduce the bug
I'm using custom service config with enabled retries. The grpc blocking call is made with a deadline.
If the deadline occurs while waiting for the grpc call to retry, then a hang occurs.
Duration deadline = Duration.ofSeconds(1);
Map<String, ?> serviceConfig = Map.of("methodConfig",
List.of(
Map.of(
"name", List.of(Map.of()),
"retryPolicy", Map.of(
"maxAttempts", 4D,
"initialBackoff", "10s",
"maxBackoff", "10s",
"backoffMultiplier", 1D,
"retryableStatusCodes", List.of("UNAVAILABLE", "UNKNOWN")
)
)
)
);
var channel = ManagedChannelBuilder.forAddress("localhost", port)
.usePlaintext()
.enableRetry()
.disableServiceConfigLookUp()
.defaultServiceConfig(serviceConfig);
var blockingStub = ExampleServiceGrpc.newBlockingStub(channel);
// If the first call to grpc returns UNKNOWN or the server is unavailable, a hang will occur.
blockingStub
.withDeadlineAfter(deadline.toMillis(), TimeUnit.MILLISECONDS)
.unaryCall(req);
I've prepared an example in a separate repository because it is difficult to reproduce. Managed to find a reproducing of the bug.
Example to reproduce here - https://github.com/turchenkoalex/grpc-hang
As far as I understand, the problem is related to the substream counter inFlightSubStreams in io.grpc.internal.RetriableStream.
When DeadlineTimer calls the cancel method on RetriableStream, the retryFuture also is canceled, but the inFlightSubStreams counter is not decremented. And for this reason, the safeCloseMasterListener method does not close the master channel.
I tried doing an explicit decrement on retryFuture.cancel and testing the new behavior. The hang is gone.
Here's the diff:
diff --git a/core/src/main/java/io/grpc/internal/RetriableStream.java b/core/src/main/java/io/grpc/internal/RetriableStream.java
index f301eee1f..07fe9e764 100644
--- a/core/src/main/java/io/grpc/internal/RetriableStream.java+++ b/core/src/main/java/io/grpc/internal/RetriableStream.java@@ -195,7 +195,10 @@ abstract class RetriableStream<ReqT> implements ClientStream {
}
}
if (retryFuture != null) {
- retryFuture.cancel(false);+ boolean cancelled = retryFuture.cancel(false);+ if (cancelled) {+ inFlightSubStreams.decrementAndGet();+ }
}
if (hedgingFuture != null) {
hedgingFuture.cancel(false);
But maybe I'm missing something
The text was updated successfully, but these errors were encountered:
What version of gRPC-Java are you using?
1.61.0 with grpc-netty-shaded
What is your environment?
jdk 17, linux
What did you expect to see?
The call of blockingUnaryCall should not block forever
What did you see instead?
Thread is hanging even if use a deadline.
jstack with problem
Steps to reproduce the bug
I'm using custom service config with enabled retries. The grpc blocking call is made with a deadline.
If the deadline occurs while waiting for the grpc call to retry, then a hang occurs.
I've prepared an example in a separate repository because it is difficult to reproduce. Managed to find a reproducing of the bug.
Example to reproduce here - https://github.com/turchenkoalex/grpc-hang
I tested this behavior with several versions of the grpc library and found that the issue appeared starting from version 1.52.0
As far as I understand, the problem is related to the substream counter
inFlightSubStreams
inio.grpc.internal.RetriableStream
.When
DeadlineTimer
calls the cancel method onRetriableStream
, the retryFuture also is canceled, but theinFlightSubStreams
counter is not decremented. And for this reason, thesafeCloseMasterListener
method does not close the master channel.I tried doing an explicit decrement on retryFuture.cancel and testing the new behavior. The hang is gone.
Here's the diff:
But maybe I'm missing something
The text was updated successfully, but these errors were encountered: