Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow managers not to expose a remote API port #1826

Merged
merged 2 commits into from
Jan 23, 2017

Conversation

aaronlehmann
Copy link
Collaborator

@aaronlehmann aaronlehmann commented Dec 21, 2016

We'd like Docker to support swarm services without needing to run swarm init. One of the preconditions for this is that the swarm manager needs to operate and be useful without exposing a port to the outside world.

This PR is split into multiple parts, which will each become a separate PR for ease of review. The first few commits are focused on allowing the agent and certificate issuance/renewal to work without relying on a TCP connection to back to the same process. This works by exposing servers such as the dispatcher and node CA on the unix socket (or equivalent), and using that instead of a TCP connection. In theory we could codegen something to patch the calls straight through to the handlers (with some stuff injected onto the context to identify the caller as the local node), but for now, using the unix socket is good enough. This required some changes to the raft proxy to let it inject the local node identity onto the context when it's calling the handler locally.

  • Fix demotion (No automatic manager shutdown on demotion/removal #1829)
  • Expose the dispatcher, node CA, and a few other services on the UNIX socket. Inject the local node identity when they receive connections this way. (Expose needed services on the control socket #1828)
  • Add a connectionbroker package. This is a small abstraction on top of Remotes that provides a gRPC connection to a manager. If running on a manager, it uses the unix socket, otherwise it will pick a remote manager using Remotes. This allows things like agent and certificate renewal to work even on a single node with no TCP port bound. (Add connectionbroker package #1850)
  • Convert the agent and CA client to use connectionbroker. (Convert code to use connectionbroker package #1851)
  • Separate binding ports from creating the manager. Add methods to Manager and Node that allow binding a port after the manager is already running. A small change to raft allows the raft code to get the machine's external address after raft is already running, instead of relying on having it already. (This PR)
  • A simple integration test that binds a port after starting the first manager. (This PR)

cc @aluzzardi @LK4D4 @diogomonica @cyli @tonistiigi @stevvooe

@aluzzardi
Copy link
Member

/cc @ehazlett @icecrime

@aluzzardi
Copy link
Member

Awesome!

This also solves many issues we had with the local agent not connecting to the local manager, such as status drifting (e.g. Node both "down" and "reachable").

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 21, 2016

@aaronlehmann can we split it into different prs for easier reviews? Connbroker change permeates almost all files, so it's hard to find other changes.

@codecov-io
Copy link

codecov-io commented Dec 21, 2016

Current coverage is 53.38% (diff: 50.88%)

Merging #1826 into master will decrease coverage by 0.29%

@@             master      #1826   diff @@
==========================================
  Files           107        107          
  Lines         18403      18489    +86   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           9880       9871     -9   
- Misses         7308       7396    +88   
- Partials       1215       1222     +7   

Sunburst

Powered by Codecov. Last update 9a7d5e6...0eca203

@aaronlehmann
Copy link
Collaborator Author

@LK4D4: This PR is split into 3 commits. I thought reviewing commit by commit would make sense, but if you think it's easier, I can split this into multiple PRs.

Would you like me to open a separate PR with just the connection broker commit, or would you like me to split that one into smaller pieces?

@aaronlehmann
Copy link
Collaborator Author

If you'd like, I could split it into a sequence of commits and/or PRs like this:

  1. Expose dispatcher/NodeCA/etc on local socket, modify proxy to support this.
  2. Add connection broker package
  3. Update existing code to use connection broker instead of remotes
  4. Add late-binding support to manager and node
  5. Add integration test

@LK4D4
Copy link
Contributor

LK4D4 commented Dec 21, 2016

@aaronlehmann splitting plan will make it way easier for me to review. I think it's better to open different PRs.
Thanks!

@aaronlehmann
Copy link
Collaborator Author

OK. If it's alright with you, I'll leave this one open as a complete PR to show the big picture, and also open one PR at a time to get the individual pieces merged.

@ehazlett
Copy link
Contributor

nice! design sgtm

@aaronlehmann
Copy link
Collaborator Author

Split the commits as discussed, and opened #1828 to review/merge the first piece. I've seen a few integration test failures with just this first one locally, but it's tricky to reproduce and I can't understand why this change would cause issues (since it's just exposing some services in a way that nothing uses yet). I wonder if the failures are related to a recent change on master instead of the changes here.

@aaronlehmann
Copy link
Collaborator Author

I have a theory about why this PR makes the integration tests flaky. I think it changes the timing enough that a demoted node can end up with a worker certificate before Raft creates outgoing connections. Then the manager blocks in WaitForLeader instead of shutting down on demotion like it's supposed to.

I'm going to open a PR to change demotion so that the manager is always shut down explicitly after a role change instead of shutting itself down. Even if that doesn't fix the problem here, it's something that badly needs to be done.

@aaronlehmann
Copy link
Collaborator Author

I'm going to open a PR to change demotion so that the manager is always shut down explicitly after a role change instead of shutting itself down. Even if that doesn't fix the problem here, it's something that badly needs to be done.

WIP PR at #1829. Unfortunately, it doesn't work yet.

@diogomonica
Copy link
Contributor

Design LGTM.

@aaronlehmann aaronlehmann changed the title [WIP] Allow managers not to expose a remote API port [Meta] Allow managers not to expose a remote API port Jan 5, 2017
@aaronlehmann
Copy link
Collaborator Author

I've rebased this on top of #1829 and made a few fixes. #1829 introduced an interesting issue. With this PR, a node that's a manager will always connect to itself instead of a random other manager for dispatcher and CA operations. However, if it has been demoted, it may not be able to issue certificates, and the CA client would not retry with other managers, so the demotion would not complete. I've fixed this so that after the first attempt to renew a certificate, the client will explicitly prefer random managers over the local connection. Also, a timeout was necessary for the certificate renewal request. If the node was demoted, it can get into a state where it waits forever for a leader.

This seems quite stable now. I'm not seeing any more issues with the integration tests.

I'll keep this open as a meta-PR, and update it as individual bits get merged. #1829 should be merged before #1828, and after that I can open more PRs for the other pieces.

@aaronlehmann aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from 827f310 to 7f69ecb Compare January 7, 2017 01:32
@aaronlehmann
Copy link
Collaborator Author

#1829 was merged. Rebased.

@aaronlehmann
Copy link
Collaborator Author

#1828 was merged. Rebased to remove that part of this meta-PR.

Opened #1850 as the next step in getting these pieces merged.

@aaronlehmann aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from 000704c to 9cb5f3c Compare January 9, 2017 21:52
@aaronlehmann
Copy link
Collaborator Author

connectionbroker was merged in #1850. Opened next PR: #1851.

@@ -132,9 +131,15 @@ type Manager struct {

cancelFunc context.CancelFunc

mu sync.Mutex
mu sync.Mutex
addrMu sync.Mutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what these two mutexes control in a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's more obvious when i look at the rest of the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil {
l := <-m.controlListener
l.Close()
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is closing the control listener and then returning an error, if binding the remote fails, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's safe to read from m.controlListener because this is before Run, so nothing else will be reading from that channel.

started chan struct{}
stopped bool

remoteListener chan net.Listener
controlListener chan net.Listener
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why these listeners are returned through channels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see now; they're channels so the listeners (most importantly, the remote listener) can be hot-swapped, like when we take a "closed" one-node cluster and make it "open"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

}

// BindRemote binds a port for the remote API.
func (m *Manager) BindRemote(ctx context.Context, addrs RemoteAddrs) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it may make more sense to put these methods definitions in the order that they're called in the initialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order changed

defer cancel()

isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1
for !isLeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this loop does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, currently only the leader can submit proposals such as configuration changes. For now, this SetAddr function is intended to address the narrow use case of a single-node cluster that didn't have ports bound before. Since the cluster only has one member, this node must be the leader, but there can be delays before that leadership is confirmed, for example at startup. This loop waits until the current node becomes the leader.

If/when we extend SetAddr to handle more general cases, we'll need some other approach here, such as an RPC we can use to tell the leader to update a node's address.

@@ -71,7 +71,7 @@ type Config struct {

// RemoteAPI is a listening address for serving the remote API, and
// an optional advertise address.
RemoteAPI RemoteAddrs
RemoteAPI *RemoteAddrs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why the change to a pointer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it can be nil before it's initialized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why not just check the default value, as opposed to nil? If ListenAddr is "", there is no remote api set, right? I'm not arguing that's better or anything - more just asking pointers are generally preferred over checking against the default value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're suggesting should work fine. I felt that since RemoteAddrs is a struct, it might be confusing to check one particular field to see if the struct is initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thank you for explaining

newMember.RaftMember.Addr = newAddr
c.members[id] = &newMember

if oldMember.RaftMember.Addr != newAddr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: maybe we can move the copying of the RaftMember object inside this block, since we don't need to make an update if the address hasn't changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@cyli cyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


n.opts.Addr = addr

if n.IsMember() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just return here on !n.IsMember()

leadershipCh, cancel := n.SubscribeLeadership()
defer cancel()

isLeader := atomic.LoadUint32(&n.signalledLeadership) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalledLeadership is really weird name now :) we should call it isLeader and use it instead of isLeader function. Otherwise, it's quite confusing here.
Probably in another PR.

@aaronlehmann
Copy link
Collaborator Author

Rebased, PTAL

@aaronlehmann aaronlehmann force-pushed the late-port-binding branch 2 times, most recently from eead57f to ae6b1df Compare January 19, 2017 18:34
@aaronlehmann
Copy link
Collaborator Author

Rebased again, and added t.Parallel to the new test.

This should be ready to merge.

if m.config.ControlAPI != "" {
return errors.New("manager already has a control API address")
}
m.config.ControlAPI = addr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should clear this in case of error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if leadershipChange == IsLeader {
isLeader = true
}
case <-ctx.Done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but seems like this ctx might be context.Background() and it's possible that node could be stopped during this loop. Maybe we should use WithContext.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to WithContext.

@@ -1092,6 +1151,11 @@ func (n *Node) reportNewAddress(ctx context.Context, id uint64) error {
if err != nil {
return err
}
if oldAddr == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I encountered it in the late-binding integration test. It happens when you create a manager without a bound port, then later bind one. The node update that adds the address gets committed to raft at some point, and new nodes that join the cluster may take a little while to catch up to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite weird. newPeer calls grpc.Dial which is supposed to check if address is empty :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should we avoid calling grpc.Dial if newPeer is passed an empty address?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno, there shouldn't be empty addresses and I wonder why it works and registers in peer list at all.

m.config.RemoteAPI = nil
// The context isn't used in this case (before (*Manager).Run).
if err := m.BindRemote(context.Background(), *config.RemoteAPI); err != nil {
l := <-m.controlListener
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can hang forever. So probably you need to check len(l).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks wrong. This code should only run if config.ControlAPI != "". I've fixed that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case where there is a remote API, but no control API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't currently any use case without a control API. It's structured that way to be consistent with BindRemote.

if config.ControlAPI != "" {
m.config.ControlAPI = ""
if err := m.BindControl(config.ControlAPI); err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we close listener here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BindControl returns an error, there is no listener to close.

@@ -143,6 +143,7 @@ func (c *Cluster) UpdateMember(id uint64, m *api.RaftMember) error {
return nil
}
oldMember.RaftMember = m
c.broadcastUpdate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -342,6 +398,9 @@ func (n *Node) JoinAndStart(ctx context.Context) (err error) {
}

// join to existing cluster
if n.opts.Addr == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand events sequence. What ensures that BindRemote is called before Run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to be called before Run. See the new test TestServiceCreateLateBind, where it is called afterwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't understand how does it work. Isn't it possible to call JoinAndStart before BindRemote and get attempted to join raft cluster without knowing own address?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to bind first if you are trying to join a cluster, but if you're starting a new cluster then you hit the n.opts.JoinAddr == "" branch and having an address is not required.

This adds BindRemote and BindControl methods to Manager, which can be
used to specify the remote API and control API addresses after creating
or starting the manager. Raft has been updated to accept a new remote
address after being started. If The RemoteAPI and ControlAPI fields are
passed to manager.New, it is not necessary to call BindRemote or
BindControl explicitly.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@LK4D4
Copy link
Contributor

LK4D4 commented Jan 23, 2017

LGTM

@LK4D4 LK4D4 merged commit 037b491 into moby:master Jan 23, 2017
@aaronlehmann aaronlehmann deleted the late-port-binding branch January 23, 2017 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants