Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bumps GRPC to 1.12.0 #2649

Merged
merged 6 commits into from
Jun 1, 2018
Merged

Bumps GRPC to 1.12.0 #2649

merged 6 commits into from
Jun 1, 2018

Conversation

cyli
Copy link
Contributor

@cyli cyli commented May 25, 2018

Supercedes #2631 by bumping GRPC to 1.12.0.

This includes a bunch of fixes to our code to address breaking API changes, namely:

  • 1.10: TLS errors are no longer surfaced when dialing with WithBlock(), so the tests need to change.
  • 1.10: Closing a GRPC server no longer returns a transport closing error, so the tests need to change.
  • 1.11+: transport.StreamFromContext has been removed, so our raft proxy and dispatcher can no longer use that API. transport is intended to be an internal package to GRPC anyway, so in general I think we aren't intended to use/manipulate the transports anyway.

This is based on #2646.

@cyli
Copy link
Contributor Author

cyli commented May 25, 2018

Oops, looks like one more issue.

--- FAIL: TestStreamRaftMessage (0.55s)
	Error Trace:	raft_test.go:978
	Error:		An error is expected but got nil.
	Messages:	Received unexpected error EOF
		

2018-05-25 18:57:11.557227 E | snap: failed to remove broken snapshot file /tmpfs/TestRaftEncryptionKeyRotationWait889009686/snap-v3-encrypted/0000000000000003-0000000000000008.snap
2018-05-25 18:57:11.809242 E | snap: failed to remove broken snapshot file /tmpfs/TestRaftEncryptionKeyRotationWait889009686/snap-v3-encrypted/0000000000000003-0000000000000009.snap
2018-05-25 18:57:12.353853 I | wal: segmented wal file /tmpfs/TestGCWAL627798653/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
2018-05-25 18:57:12.365181 I | wal: segmented wal file /tmpfs/TestGCWAL142803895/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
2018-05-25 18:57:12.376396 I | wal: segmented wal file /tmpfs/TestGCWAL383336376/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
2018-05-25 18:57:12.968345 I | wal: segmented wal file /tmpfs/TestGCWAL997199274/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
2018-05-25 18:57:12.987945 I | wal: segmented wal file /tmpfs/TestGCWAL009475841/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
2018-05-25 18:57:12.989433 I | wal: segmented wal file /tmpfs/TestGCWAL207767148/wal-v3-encrypted/0000000000000001-000000000000000d.wal is created
FAIL

I didn't change raft, so possibly something else in GRPC changed.

@codecov
Copy link

codecov bot commented May 26, 2018

Codecov Report

Merging #2649 into master will increase coverage by 0.34%.
The diff coverage is 32%.

@@            Coverage Diff            @@
##           master   #2649      +/-   ##
=========================================
+ Coverage   61.85%   62.2%   +0.34%     
=========================================
  Files         134     134              
  Lines       21823   21739      -84     
=========================================
+ Hits        13499   13522      +23     
+ Misses       6869    6759     -110     
- Partials     1455    1458       +3

@thaJeztah
Copy link
Member

and it's green! 🎉

Copy link
Contributor

@anshulpundir anshulpundir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Perhaps some squashing is needed on the code changes ?

1. We now call `Clone` on the tls config when cloning the MutableTLSConfig
because you can't reuse a tls.Config for multiple connections, which we
would be doing if something called MutableTLSCreds.Clone, which newer
versions of GRPC do.

2. To prevent data races when grpc attempts to log, only set the grpc logger
to discard in test init functions, rather than in TestMain functions as is
documented by the `SetLoggerV2` function.  Also, use `SetLoggerV2` since
SetLogger is deprecated.

3. Wrap the default logger in a private struct that converts a logrus
entry into a grpc LoggerV2, since the grpc Logger is deprecated.

4. Stop using transport.StreamFromContext to get the peer address when checking
redirects in the raft proxy.  grpc/transport is an internal package that
downstream should not depend on, as the API may change (StreamFromContext
for example is removed in >=1.11).  peer.FromContext can get the peer address
just as well.

Signed-off-by: Ying Li <ying.li@docker.com>
cyli added 3 commits May 29, 2018 12:02
1. The GRPC update stopped returning errors when the listener underlying
a GRPC server closed, so remove that assertion from the tests.

2. Change the first error tracking in node not to check the GRPC error type for
x509 errors, since it's changed between GRPC <1.10 and >=1.10, and the error
message has changed as well in >=1.11. Also change the error message checking
to something that works in <1.10 as well as >=1.11.

Signed-off-by: Ying Li <ying.li@docker.com>
…r that

can surface TLS handshake errors, which GRPC considers transient.

Signed-off-by: Ying Li <ying.li@docker.com>
to reconnect.  Just return an error - if an error is returned, the agent
will reconnect anyway - it seems like the transport closing was a way
to convince GRPC's load balancer to start reconnecting automatically.
See moby#797.

Signed-off-by: Ying Li <ying.li@docker.com>
@cyli
Copy link
Contributor Author

cyli commented May 29, 2018

@anshulpundir There are a couple I'd like to leave separate in case we need to revert any of these. The changes that I think should work regardless of GRPC version are squashed and up top.

  1. The change to the tests to use our own custom dialer - this is a pretty ugly hack, and the GRPC devs seem to indicate that there are additional changes coming down the line for Dial and maybe the error returned by the context timeout (see TLS failures in blocking Dial calls don't provide useful error messages grpc/grpc-go#2031 (comment)). It may be easier to revert the commit than redo everything - not sure what those changes may look like.

  2. The change where I just remove the code that deletes the transport.StreamFromContext(ctx).Close() from the dispatcher - I think this was just an optimization and is not needed, but I wasn't really sure about it. I believe if we originally couldn't close the transport, we'd just close the session and error anyway, and the agent will reconnect regardless. The original change came from agent: handle initial session message #797. Sorry to bother you @stevvooe, but does that sound correct?

  3. The change to the test where sending a message beyond the max receivable size no longer returns an EOF because I have no idea what caused this change in behavior. The change was originally from Snapshot streaming #2458, but I can't find a corresponding change in GRPC which would change the behavior on send.

@@ -975,7 +975,7 @@ func TestStreamRaftMessage(t *testing.T) {

raftMsg := &api.StreamRaftMessageRequest{Message: msg}
err = stream.Send(raftMsg)
assert.Error(t, err, "Received unexpected error EOF")
assert.NoError(t, err, "Received unexpected error EOF")
Copy link
Member

@thaJeztah thaJeztah May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the "Received unexpected error EOF" should be removed here?

i.e., just

assert.NoError(t, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I yes, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

cyli added 2 commits May 29, 2018 12:21
no longer returns an EOF.

Signed-off-by: Ying Li <ying.li@docker.com>
Signed-off-by: Ying Li <ying.li@docker.com>
@cyli
Copy link
Contributor Author

cyli commented May 31, 2018

cc @tonistiigi - Apologies for bothering you, but you were pinged for review on #797. I was wondering if you remembered enough about that to review this change: 90418ad?

#2 in my comment up above.

@tonistiigi
Copy link
Member

@cyli Yes, there shouldn't be a need for that transport close.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!

@dperny
Copy link
Collaborator

dperny commented Jun 1, 2018

LGTM, let's ship it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants