Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make verify_outgoing compatible with agents not configured for TLS #3001

Merged
merged 11 commits into from
May 10, 2017

Conversation

kyhavlov
Copy link
Contributor

@kyhavlov kyhavlov commented May 3, 2017

Fixes #1705 by allowing a graceful transition from a non-TLS configured cluster to a TLS one.

@kyhavlov kyhavlov requested review from magiconair and slackpad May 3, 2017 22:05
consul/client.go Outdated
eventCh: make(chan serf.Event, serfEventBacklog),
logger: logger,
shutdownCh: make(chan struct{}),
}

c.connPool = NewPool(config.LogOutput, clientRPCConnMaxIdle, clientMaxStreams, tlsWrap, c.LANMembers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this above and set it in the structure as connPool: connPool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pass in the LANMembers function from the client as a callback, though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out the comments below - I think we can move the "should use TLS" logic onto https://github.com/hashicorp/consul/blob/master/consul/agent/server.go and then just pass the result of that into dial and friends.

consul/client.go Outdated
@@ -150,6 +151,10 @@ func (c *Client) setupSerf(conf *serf.Config, ch chan serf.Event, path string) (
conf.Tags["vsn_min"] = fmt.Sprintf("%d", ProtocolVersionMin)
conf.Tags["vsn_max"] = fmt.Sprintf("%d", ProtocolVersionMax)
conf.Tags["build"] = c.config.Build
conf.Tags["rpc_port"] = fmt.Sprintf("%d", c.config.RPCAddr.Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't ever inbound connect to a client agent, right? We probably don't need this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right, I thought the keyring or event stuff might use it, but those both use serf to gossip out, so we can remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - they don't have an inbound listener at this time.

consul/pool.go Outdated
}

return conn, hc, nil
}

// memberExpectsTLS returns true if the member with the specified address is
// verifying incoming TLS connections
func memberExpectsTLS(addr string, members []serf.Member) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty expensive since it's scanning all the members. In other methods where we use the connPool, we actually have an agent.Server, which is essentially the Serf info - see https://github.com/hashicorp/consul/blob/master/consul/rpc.go#L259-L266. We could probably put the TLS logic in there, and pass that through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like as a useTLS Boolean maybe.

consul/server.go Outdated
@@ -379,6 +380,7 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w
conf.Tags["raft_vsn"] = fmt.Sprintf("%d", s.config.RaftConfig.ProtocolVersion)
conf.Tags["build"] = s.config.Build
conf.Tags["port"] = fmt.Sprintf("%d", addr.Port)
conf.Tags["rpc_port"] = fmt.Sprintf("%d", s.config.RPCAddr.Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same as the existing port tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing tag has the LAN gossip port, not the RPC port

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - it gets populated from the RPC listener above (sanity check on https://demo.consul.io/v1/agent/self?pretty shows 8300).

consul/server.go Outdated
@@ -388,6 +390,9 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w
if s.config.NonVoter {
conf.Tags["nonvoter"] = "1"
}
if s.config.VerifyIncoming {
conf.Tags["tls_verify_incoming"] = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to key this off of certs being configured and have a sense more like use_tls? During the transition we will have to have verify turned off, but the server set up to handle incoming TLS connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this can be more permissive since we only really need to avoid trying to do outgoing TLS to an agent that doesn't have it configured at all

@slackpad
Copy link
Contributor

slackpad commented May 3, 2017

Put some comments on the review - I think we need to work through the overall approach a little bit more and then do another round.

consul/client.go Outdated
eventCh: make(chan serf.Event, serfEventBacklog),
logger: logger,
shutdownCh: make(chan struct{}),
}

c.connPool = NewPool(config.LogOutput, clientRPCConnMaxIdle, clientMaxStreams, tlsWrap, c.LANMembers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

consul/pool.go Outdated
@@ -151,11 +157,13 @@ type ConnPool struct {
// Set maxTime to 0 to disable reaping. maxStreams is used to control
// the number of idle streams allowed.
// If TLS settings are provided outgoing connections use TLS.
func NewPool(logOutput io.Writer, maxTime time.Duration, maxStreams int, tlsWrap tlsutil.DCWrapper) *ConnPool {
func NewPool(logOutput io.Writer, maxTime time.Duration, maxStreams int, tlsWrap tlsutil.DCWrapper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single line pls.

consul/pool.go Outdated
// Switch the connection into TLS mode
if _, err := conn.Write([]byte{byte(rpcTLS)}); err != nil {
conn.Close()
doWrap, err := memberExpectsTLS(addr.String(), p.memberFunc())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a lot of magic. Please add a comment explaining what happens here and why.

// Switch the connection into TLS mode
if _, err := conn.Write([]byte{byte(rpcTLS)}); err != nil {
conn.Close()
doWrap, err := memberExpectsTLS(string(address), l.memberFunc())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the previous code. helper fn?

consul/server.go Outdated
@@ -270,6 +269,8 @@ func NewServer(config *Config) (*Server, error) {
shutdownCh: make(chan struct{}),
}

s.connPool = NewPool(config.LogOutput, serverRPCCache, serverMaxStreams, tlsWrap, s.LANMembers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with James on that one. Create first, then inject.

@slackpad
Copy link
Contributor

slackpad commented May 5, 2017

One other thing - will this take away TLS until everybody is upgraded (to have the proper tag)? I wonder if we do need that config after all (you'd still want the tag because it makes the upgrade seamless), but it seems like we need to turn this on to not break existing behavior. That, or we could add the tag now and enable the actual switching part in pool.go in a later version of Consul.

@kyhavlov
Copy link
Contributor Author

kyhavlov commented May 5, 2017

We could set use_tls to 0 when verify_incoming is disabled and treat a missing tag as 1, which would be the existing behavior.

@slackpad
Copy link
Contributor

slackpad commented May 5, 2017

We could set use_tls to 0 when verify_incoming is disabled and treat a missing tag as 1, which would be the existing behavior.

Or would missing tag == 1 be equivalent to now, even? If you have TLS configured it will try to use TLS. The mixed mode business only works if your whole cluster is upgraded.

@slackpad
Copy link
Contributor

slackpad commented May 5, 2017

Would be good to add some new tests as well.

Copy link
Contributor

@slackpad slackpad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the logic in server.go should make this a lot more efficient and flows the information down in a good way. Put some open questions on here, and we also need to update the docs.

@@ -40,6 +40,7 @@ type Server struct {
NonVoter bool
Addr net.Addr
Status serf.MemberStatus
UseTLS bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth some documentation about how it should be used.

s.localLock.RUnlock()

if !ok {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be safer as return true, as if we don't know anything about the tag and defer to whether TLS is configured our not the old way (based on tlsWrap). Does that seem right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that defaulting to true (for TLS on) seems better, I thought I had it that way but must have changed it.

@@ -72,6 +73,9 @@ func IsConsulServer(m serf.Member) (bool, *Server) {
datacenter := m.Tags["dc"]
_, bootstrap := m.Tags["bootstrap"]

tlsTag, ok := m.Tags["use_tls"]
useTLS := !ok || tlsTag == "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth adding a few lines in the unit test.

defer os.RemoveAll(dir2)
defer s2.Shutdown()

// Try to join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can convert this to use the new test helpers and retry stuff - that landed this morning.

c.CAFile = "../test/client_certs/rootca.crt"
c.CertFile = "../test/client_certs/server.crt"
c.KeyFile = "../test/client_certs/server.key"
c.VerifyOutgoing = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - shouldn't this fail if you have verify_outgoing set and the other end doesn't support TLS? It seems like during migration you'd need to turn this off until everything was upgraded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_outgoing needs to work for servers that don't have TLS on, otherwise we would need 2 intermediary steps instead of one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking you'd leave both verify settings off, but configure the CA everywhere, then in a final step turn verify on. It seems like if you have verify_outgoing on, you really don't want to connect to anyone insecurely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you turn them both on at once with no interop, servers with verify_incoming will reject connections from the servers that don't have verify_outgoing set.

if a.config.CAPath != "" || a.config.CAFile != "" {
base.VerifyOutgoing = true
}
base.ForceVerifyOutgoing = a.config.VerifyOutgoing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of remapping VerifyOutgoing to ForceVerifyOutgoing how about keeping the sense of VerifyOutgoing the same and adding something like UseTLS?


## Configuring TLS on an existing cluster

As of version 0.8.2, Consul supports migrating to TLS-encrypted traffic on a running cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.8.3

consul/server.go Outdated
@@ -393,6 +393,9 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w
if s.config.NonVoter {
conf.Tags["nonvoter"] = "1"
}
if s.config.CAFile != "" || s.config.CAPath != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could key off of the s.config.UseTLS as well (see comment above).

@kyhavlov kyhavlov force-pushed the tls-outgoing-interop branch from dd7433f to ac0f187 Compare May 10, 2017 00:21
@kyhavlov kyhavlov force-pushed the tls-outgoing-interop branch from ac0f187 to 250008b Compare May 10, 2017 00:40
Copy link
Contributor

@slackpad slackpad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted some small stuff, but otherwise this is looking like it's almost there!

retry.Run(t, func(r *retry.R) { r.Check(wantPeers(s1, 2)) })

// Have s2 make an RPC call to s1
s2.localLock.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to unlock this.


testrpc.WaitForLeader(t, s1.RPC, "dc1")

// Add a second server with TLS and VerifyOutgoing set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment.

c.KeyFile = "../test/client_certs/server.key"
c.VerifyIncoming = true
c.VerifyOutgoing = true
c.UseTLS = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set automatically, right?


testrpc.WaitForLeader(t, s1.RPC, "dc1")

// Add a second server with TLS and VerifyOutgoing set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment.

c.CAFile = "../test/client_certs/rootca.crt"
c.CertFile = "../test/client_certs/server.crt"
c.KeyFile = "../test/client_certs/server.key"
c.UseTLS = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this, actually, since setting the CA will set this.

@@ -49,6 +49,10 @@ type Config struct {
// existing clients.
VerifyServerHostname bool

// UseTLS is used to enable outgoing TLS verification of all outgoing connections to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little misleading with the word "verification". I'd just say "UseTLS enables outgoing TLS connections to Consul servers."

c.CAFile = "../test/client_certs/rootca.crt"
c.CertFile = "../test/client_certs/server.crt"
c.KeyFile = "../test/client_certs/server.key"
c.UseTLS = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need to set this manually.

consul/server.go Outdated
s.localLock.RUnlock()

if !ok {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should flip back to false with this new implementation. The verify_outgoing forces it, otherwise if we don't know anything about the other server then we probably don't want to use TLS.

Copy link
Contributor

@slackpad slackpad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I tweaked one comment)

@kyhavlov kyhavlov merged commit 5bab68b into master May 10, 2017
@kyhavlov kyhavlov deleted the tls-outgoing-interop branch May 10, 2017 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants