-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not to send any request at GrpcSubscriptionImpl destructor #4178
Conversation
@@ -19,6 +19,7 @@ class GrpcSubscriptionImpl : public Config::Subscription<ResourceType> { | |||
const Protobuf::MethodDescriptor& service_method, SubscriptionStats stats) | |||
: grpc_mux_(node, std::move(async_client), dispatcher, service_method, random), | |||
grpc_mux_subscription_(grpc_mux_, stats) {} | |||
~GrpcSubscriptionImpl() { grpc_mux_.noMoreRequestSending(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct; grpc_mux_
might be an ADS connection. When I delete an individual subscriber, e.g. EDS, I shouldn't permanently disable the muxer. It seems like what we had before is the right thing, the gRPC subscription should relinquish its watch by deleting it, and then via RAII it cleans up its state in the gRPC mux. But, from the issue, there is some bug still. Can you deep dive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GrpcSubscriptionImpl owns grpc_mux_. Look at line 39 of this file.
At its destructor, grpc_mux_ is about to be deleted. It is OK to mark all its watchers; not to send request when a watcher is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, was confusing this with GrpcMuxSubscriptionImpl
. It's still not right IMHO, since you're reaching in and dealing with internals in a way that breaks abstractions. The correct solution should clean up via RAII of watchers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the code a bit more. I think even with your change, there is a race to worry about. The completion thread should still be live when the client is deleted, as that lives in TLS (are you seeing https://github.com/envoyproxy/envoy/blob/master/source/common/grpc/google_async_client_impl.cc#L19 invoked?). The async client should have reset the stream on its way out. I wonder if we have an issue in https://github.com/envoyproxy/envoy/blob/master/source/common/grpc/google_async_client_impl.cc#L363?
Can you provide a full trace level log and backtrace with line numbers? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a trace log to the issue: #4167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm following up with @qiwzhang on IM offline, it looks like SDS still has an active gRPC subscription after TLS shutdown has begun, this is dangerous and we need to figure out a way to avoid that)
Signed-off-by: Wayne Zhang <qiwzhang@google.com>
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This issue is reported at #4352. |
@qiwzhang did you make progress on our previous discussion? |
Signed-off-by: Wayne Zhang qiwzhang@google.com
Description:
To fix: #4167
Since GrpcSubscriptionImpl owns grpc_mux_. In its destructor, instruct grpc_mux_ not to send any more requests by setting all its watcher with inserted_ to false.
Risk Level: Low
Testing:
In this PR (#4176), run
bazel test //test/integration:sds_dynamic_integration_test --runs_per_test=1000
They all passed. Before this fix, 50 of 1000 failed.