-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is no response after grpc runs for a period of time #6858
Comments
Could this be a dup of #6783? My company hasn't been using gRPC 1.59.0 because of that bug. https://github.com/grpc/grpc-go/releases doesn't list 1.59.0, although the tag https://github.com/grpc/grpc-go/releases/tag/v1.59.0 still exists. Perhaps it's an attempt to "unrelease" it? Is it because of this problem? 1.60.0 solves this issue but it looks like it introduces some other problems (#6854). We've been running 1.58 and it's working great. |
That is weird. We haven't made any attempt so far to unrelease 1.59.
We did have a deadlock that could happen in 1.59 if the channel received an update from the resolver around the same time that it was trying to go idle. And that has been fixed now. And we pushed a fix to #6854 as well. We haven't done a patch release so far. |
Disabling idleness would be a workaround for using 1.59.0 and avoiding that bug.
Well that's strange. For me it shows up at the top of the list, even higher on the page than 1.60, which I noticed earlier and thought was also strange.
FWIW, this issue can only happen if you are using |
@itgcl : As mentioned in the previous comments, we expect the issue to be resolved by upgrading to v1.60.0. Please let us know if that helps. |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
@easwars : I tried to upgrade to v.16.0, but this bug still exists. At present, I have been downgraded to v1.58. It runs normally. |
@atollena: I tried to upgrade to v.16.0, but this bug still exists, I have been downgraded to v1.58. |
+1, we upgraded to 1.60.1 and the bug persists. Rolling back to 1.58 as well |
|
In our case we use xDS. After a day xDS will report that name resolving result in nothing, probably because xDS client lose its connection to the xDS server. It is pretty reproducible on our test environment, but it's a real application - we don't have minimal reproduction case yet. It take a few hours before the issue kicks in though. One of my team does not experience the issue (yet), so we're suspecting that the issue may depend on usage volume as well. |
IIUC, you are seeing the issue in the channel between the xDS client inside gRPC and the xDS management server, and not between your application's client and server? We do have detailed logs for the xDS client that you can turn on by setting the following env vars: |
|
@itgcl : Would it be possible for you to share your repro? Thanks. |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
We have a quite similar problem in Dragonfly2. The resolver is fetching the latest nodes periodically, when the nodes changed, it will call
But when last grpc connection is gone, create a new grpc connection and |
What version of gRPC are you using?
v.1.59.0
What version of Go are you using (
go version
)?v1.19
What operating system (Linux, Windows, …) and version?
linux and mac
What did you do?
Initialize and connect the grpc client, and the call method will respond normally. Call the method again after the service runs for two hours, no response until the timeout.
Through debug, I found that the request will wait until the timeout exit in this method.
cc.firstResolveEvent.HasFired() Return false.
func (cc *ClientConn) waitForResolvedAddrs(ctx context.Context) error {
// This is on the RPC path, so we use a fast path to avoid the
// more-expensive "select" below after the resolver has returned once.
if cc.firstResolveEvent.HasFired() {
return nil
}
select {
case <-cc.firstResolveEvent.Done():
return nil
case <-ctx.Done():
return status.FromContextError(ctx.Err()).Err()
case <-cc.ctx.Done():
return ErrClientConnClosing
}
}
The text was updated successfully, but these errors were encountered: