Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not to send any request at GrpcSubscriptionImpl destructor #4178

Closed
wants to merge 1 commit into from

Conversation

qiwzhang
Copy link
Contributor

Signed-off-by: Wayne Zhang qiwzhang@google.com

Description:
To fix: #4167

Since GrpcSubscriptionImpl owns grpc_mux_. In its destructor, instruct grpc_mux_ not to send any more requests by setting all its watcher with inserted_ to false.

Risk Level: Low

Testing:
In this PR (#4176), run

bazel test //test/integration:sds_dynamic_integration_test --runs_per_test=1000

They all passed. Before this fix, 50 of 1000 failed.

@qiwzhang qiwzhang changed the title Not to send a request at GrpcSubscriptionImpl destructor Not to send any request at GrpcSubscriptionImpl destructor Aug 16, 2018
@lizan lizan requested a review from htuch August 16, 2018 06:14
@@ -19,6 +19,7 @@ class GrpcSubscriptionImpl : public Config::Subscription<ResourceType> {
const Protobuf::MethodDescriptor& service_method, SubscriptionStats stats)
: grpc_mux_(node, std::move(async_client), dispatcher, service_method, random),
grpc_mux_subscription_(grpc_mux_, stats) {}
~GrpcSubscriptionImpl() { grpc_mux_.noMoreRequestSending(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct; grpc_mux_ might be an ADS connection. When I delete an individual subscriber, e.g. EDS, I shouldn't permanently disable the muxer. It seems like what we had before is the right thing, the gRPC subscription should relinquish its watch by deleting it, and then via RAII it cleans up its state in the gRPC mux. But, from the issue, there is some bug still. Can you deep dive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GrpcSubscriptionImpl owns grpc_mux_. Look at line 39 of this file.
At its destructor, grpc_mux_ is about to be deleted. It is OK to mark all its watchers; not to send request when a watcher is removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, was confusing this with GrpcMuxSubscriptionImpl. It's still not right IMHO, since you're reaching in and dealing with internals in a way that breaks abstractions. The correct solution should clean up via RAII of watchers.

Copy link
Member

@htuch htuch Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the code a bit more. I think even with your change, there is a race to worry about. The completion thread should still be live when the client is deleted, as that lives in TLS (are you seeing https://github.com/envoyproxy/envoy/blob/master/source/common/grpc/google_async_client_impl.cc#L19 invoked?). The async client should have reset the stream on its way out. I wonder if we have an issue in https://github.com/envoyproxy/envoy/blob/master/source/common/grpc/google_async_client_impl.cc#L363?

Can you provide a full trace level log and backtrace with line numbers? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a trace log to the issue: #4167

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm following up with @qiwzhang on IM offline, it looks like SDS still has an active gRPC subscription after TLS shutdown has begun, this is dangerous and we need to figure out a way to avoid that)

Signed-off-by: Wayne Zhang <qiwzhang@google.com>
@stale
Copy link

stale bot commented Aug 24, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 24, 2018
@stale
Copy link

stale bot commented Aug 31, 2018

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this Aug 31, 2018
@JimmyCYJ
Copy link
Member

JimmyCYJ commented Sep 5, 2018

This issue is reported at #4352.

@htuch
Copy link
Member

htuch commented Sep 5, 2018

@qiwzhang did you make progress on our previous discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GrpcSubscriptionImpl bug: send another request at destructor
3 participants