Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connectionbroker package #1850

Merged
merged 1 commit into from
Jan 9, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions connectionbroker/broker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Package connectionbroker is a layer on top of remotes that returns
// a gRPC connection to a manager. The connection may be a local connection
// using a local socket such as a UNIX socket.
package connectionbroker

import (
"sync"

"github.com/docker/swarmkit/api"
"github.com/docker/swarmkit/remotes"
"google.golang.org/grpc"
)

// Broker is a simple connection broker. It can either return a fresh
// connection to a remote manager selected with weighted randomization, or a
// local gRPC connection to the local manager.
type Broker struct {
mu sync.Mutex
remotes remotes.Remotes
localConn *grpc.ClientConn
}

// New creates a new connection broker.
func New(remotes remotes.Remotes) *Broker {
return &Broker{
remotes: remotes,
}
}

// SetLocalConn changes the local gRPC connection used by the connection broker.
func (b *Broker) SetLocalConn(localConn *grpc.ClientConn) {
b.mu.Lock()
defer b.mu.Unlock()

b.localConn = localConn
}

// Select a manager from the set of available managers, and return a connection.
func (b *Broker) Select(dialOpts ...grpc.DialOption) (*Conn, error) {
b.mu.Lock()
localConn := b.localConn
b.mu.Unlock()

if localConn != nil {
return &Conn{
ClientConn: localConn,
isLocal: true,
}, nil
}

return b.SelectRemote(dialOpts...)
}

// SelectRemote chooses a manager from the remotes, and returns a TCP
// connection.
func (b *Broker) SelectRemote(dialOpts ...grpc.DialOption) (*Conn, error) {
peer, err := b.remotes.Select()
if err != nil {
return nil, err
}

cc, err := grpc.Dial(peer.Addr, dialOpts...)
if err != nil {
b.remotes.ObserveIfExists(peer, -remotes.DefaultObservationWeight)
return nil, err
}

return &Conn{
ClientConn: cc,
remotes: b.remotes,
peer: peer,
}, nil
}

// Remotes returns the remotes interface used by the broker, so the caller
// can make observations or see weights directly.
func (b *Broker) Remotes() remotes.Remotes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this method? It seems to me that this breaks separation of concerns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent needs access to Remotes to update the set of managers based on the list it gets from the dispatcher.

https://github.com/docker/swarmkit/pull/1826/files#diff-6ea70e0a4de93ebcb83000a028edc163L344

The alternative to exposing Remotes through this method is to implement Observe, Weights, and Remove as connectionbroker methods, and that feels like it would pollute the connectionbroker method set, instead of keeping that functionality specific to Remotes.

Or we could pass the underlying Remotes into the agent separately, but I don't like that much either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

return b.remotes
}

// Conn is a wrapper around a gRPC client connection.
type Conn struct {
*grpc.ClientConn
isLocal bool
remotes remotes.Remotes
peer api.Peer
}

// Close closes the client connection if it is a remote connection. It also
// records a positive experience with the remote peer if success is true,
// otherwise it records a negative experience. If a local connection is in use,
// Close is a noop.
func (c *Conn) Close(success bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little counter-intuitive to close an "unsuccessful" connection. Would it be better to move if success logic to SelectRemote.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close won't be called when SelectRemote returns an error. The success parameter is determined later on. For example, if something tries an RPC and that RPC fails, success will be set to false when closing the connection so that a negative observation is made and future attempts will prefer a different remote.

I think Dial generally doesn't fail immediately (since Dial is an async call), but it's probably a good idea to automatically perform a negative observation if Dial returns an error. I'll add that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an observation if Dial fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can success be internal to Conn, like derived from operation failure on Conn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, because Conn doesn't know whether RPCs that use it fail or not.

We could split it out into a separate Observe call on Conn. But I think it's nice to have it built into Close. That way, the use of a connection has to be either successful or unsuccessful. You can't forget to call Observe (we've had bugs like this before).

If it's weird for Close to take a parameter, we could rename this to something else like CloseAndObserve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Conn doesn't know whether RPCs that use it fail or not.

Could this be moved to where failure is observed? If Conn cannot observe failures, maybe weight handling shouldn't be done here. How about doing it at RPC calls? When RPC fails, decrease the node's weight. On RPC success, adjust weight according to previous state (no need to increase weight on continuous success operations).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved to where failure is observed?

We can do that, but as I mentioned above, I think it's better this way. We had a several bugs before where certain code paths returned early and skipped adjusting the weight. If weight adjustment is part of closing the connection, it can't be skipped accidentally without leaking a connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. We have similar problem in class Swarm where operation failure should adjust node preference. It's hard to do that on all the calls.

if c.isLocal {
return nil
}

if success {
c.remotes.ObserveIfExists(c.peer, -remotes.DefaultObservationWeight)
} else {
c.remotes.ObserveIfExists(c.peer, remotes.DefaultObservationWeight)
}

return c.ClientConn.Close()
}