Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streams: Stop cleaning up after orphaned streams #1854

Merged
merged 3 commits into from
Feb 8, 2018

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Feb 8, 2018

This change introduces some behavior changes that should not impact users that are following the proper stream protocol. Specifically, one of the following conditions must be satisfied:

  1. The user calls Close on the ClientConn.
  2. The user cancels the context provided to NewClientStream, or its deadline expires. (Note that it if the context is no longer needed before the deadline expires, it is still recommended to call cancel to prevent bloat.) It is always recommended to cancel contexts when they are no longer needed, and to never use the background context directly, so all users should always be doing this.
  3. The user calls RecvMsg (or Recv in generated code) until a non-nil error is returned.
  4. The user receives any error from Header or SendMsg (or Send in generated code) besides io.EOF.

If none of the above happen, this will leak a goroutine and a context, and grpc will not call the optionally-configured stats handler with a stats.End message.

Before this change, if a user created a stream and the server ended the stream, the stats handler would be invoked with a stats.End containing the final status of the stream. Subsequent calls to RecvMsg would then trigger the stats handler with InPayloads, which may be unexpected by stats handlers.


This change is Reviewable

@dfawley dfawley added the Type: Behavior Change Behavior changes not categorized as bugs label Feb 8, 2018
@dfawley dfawley requested a review from menghanl February 8, 2018 00:45
@menghanl
Copy link
Contributor

menghanl commented Feb 8, 2018

Reviewed 6 of 6 files at r1.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


clientconn.go, line 1384 at r1 (raw file):

// Deprecated: This error is never returned by grpc and should not be
// referenced by users.
var ErrClientConnTimeout = errors.New("grpc: timed out when dialing")

Why is this error moved here?


stream.go, line 131 at r1 (raw file):

	if mc.Timeout != nil && *mc.Timeout >= 0 {
		// The cancel function for this context will only be called when RecvMsg
		// returns non-nil error or another error is encountered, which means the

What does this mean?

Also move the comment to where cancel is defined? (And move the definition of cancel closer to if?)


stream.go, line 419 at r1 (raw file):

			// because the generated code requires it.
			if err == io.EOF && !cs.desc.ServerStreams {
				err = nil

return nil in the second if err == io.EOF on line 481 instead of setting it back to nil here?
So you can save this if, and finish() will always be called with io.EOF in the successful case.


stream.go, line 421 at r1 (raw file):

				err = nil
			}
			cs.finish(err)

So we will call finish() with nil in some cases. It should be OK though.


stream.go, line 491 at r1 (raw file):

	}
	cs.sentLast = true
	cs.t.Write(cs.s, nil, nil, &transport.Options{Last: true})

Add comment why we don't care the returned error.


stats/stats_test.go, line 1228 at r1 (raw file):

func TestClientStatsFullDuplexRPCNotCallingLastRecv(t *testing.T) {
	count := 1
	testClientStats(t, &testConfig{compress: "gzip"}, &rpcConfig{count: count, success: true, failfast: false, callType: fullDuplexStreamRPC, noLastRecv: true}, map[int]*checkFuncWithCount{

Also delete noLastRecv field from rpcConfig.


test/end2end_test.go, line 1167 at r1 (raw file):

		Payload:            payload,
	}
	if err := stream.Send(req); err != nil && err != io.EOF {

Should we send multiple times and expect io.EOF eventually?


test/end2end_test.go, line 3800 at r1 (raw file):

	cc := te.clientConn()
	tc := testpb.NewTestServiceClient(cc)
	ctx, cancel := context.WithCancel(context.Background())

Consider also defering this cancel? Otherwise a fatal below may cause this to be leaked.


Comments from Reviewable

@dfawley
Copy link
Member Author

dfawley commented Feb 8, 2018

Review status: all files reviewed at latest revision, 8 unresolved discussions.


clientconn.go, line 1384 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Why is this error moved here?

I was tired of looking at it. I'd delete it, but it's possible it would break somebody, so I don't think we should (and it's harmless).


stream.go, line 131 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

What does this mean?

Also move the comment to where cancel is defined? (And move the definition of cancel closer to if?)

PTAL


stream.go, line 419 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

return nil in the second if err == io.EOF on line 481 instead of setting it back to nil here?
So you can save this if, and finish() will always be called with io.EOF in the successful case.

I can do this, but it needs an explicit finish() call below in that case, too, because the defer can only finish() on error. Changed to reflect. It's maybe cleaner because the special-case stuff is closer together now?


stream.go, line 421 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

So we will call finish() with nil in some cases. It should be OK though.

Yes, that's fine. finish() converts io.EOF to nil anyway.


stream.go, line 491 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Add comment why we don't care the returned error.

Done


stats/stats_test.go, line 1228 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Also delete noLastRecv field from rpcConfig.

Done.


test/end2end_test.go, line 1167 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Should we send multiple times and expect io.EOF eventually?

Done.


test/end2end_test.go, line 3800 at r1 (raw file):

Previously, menghanl (Menghan Li) wrote…

Consider also defering this cancel? Otherwise a fatal below may cause this to be leaked.

I had a defer here once, where did it go? Re-added.


Comments from Reviewable

@menghanl
Copy link
Contributor

menghanl commented Feb 8, 2018

:lgtm:


Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@dfawley dfawley merged commit 365770f into grpc:master Feb 8, 2018
@dfawley dfawley deleted the closestream branch February 9, 2018 17:01
rakyll added a commit to rakyll/opencensus-go that referenced this pull request Feb 9, 2018
gRPC introduced some behavioral changes that broke OpenCensus support
for streaming outgoing calls.

See grpc/grpc-go#1854.

In the case of a failed response, gRPC doesn't call the stats
handler with the stats.End message. Given we cannot recieve the
end message, we cannot finish the client span started for the
streaming request.

Skip this test until we figure out whether we are implementing
the streaming protocol properly or gRPC introduced a bug.
rakyll added a commit to rakyll/opencensus-go that referenced this pull request Feb 9, 2018
gRPC introduced some behavioral changes that broke OpenCensus support
for streaming outgoing calls.

See grpc/grpc-go#1854.

In the case of a failed response, gRPC doesn't call the stats
handler with the stats.End message. Given we cannot recieve the
end message, we cannot finish the client span started for the
streaming request.

Skip this test until we figure out whether we are implementing
the streaming protocol properly or gRPC introduced a bug.
rakyll pushed a commit to census-instrumentation/opencensus-go that referenced this pull request Feb 9, 2018
gRPC introduced some behavioral changes that broke OpenCensus support
for streaming outgoing calls.

See grpc/grpc-go#1854.

In the case of a failed response, gRPC doesn't call the stats
handler with the stats.End message. Given we cannot recieve the
end message, we cannot finish the client span started for the
streaming request.

Skip this test until we figure out whether we are implementing
the streaming protocol properly or gRPC introduced a bug.
@menghanl menghanl added this to the 1.10 Release milestone Feb 14, 2018
@vitalyisaev2
Copy link
Contributor

@dfawley Could you please clarify if the example of client-side streaming is following the proper stream protocol? Because context.Background() is used to create stream, and it's not cancelled explicitly.

https://github.com/grpc/grpc-go/blob/master/examples/route_guide/client/client.go#L59

@menghanl
Copy link
Contributor

@vitalyisaev2 Thanks for pointing this out!

The route_guide example does Recv until a non-nil error is returned, so it follows that

  1. The user calls RecvMsg (or Recv in generated code) until a non-nil error is returned.

Using context.Background() is not something we would recommend, so I filed #1877 to add timeout to all context in the examples, please take a look.

I also filed #1878 to check other background context used.

@vitalyisaev2
Copy link
Contributor

@menghanl thank you, now examples look much more idiomatic.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants