-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make verify_outgoing compatible with agents not configured for TLS #3001
Conversation
consul/client.go
Outdated
eventCh: make(chan serf.Event, serfEventBacklog), | ||
logger: logger, | ||
shutdownCh: make(chan struct{}), | ||
} | ||
|
||
c.connPool = NewPool(config.LogOutput, clientRPCConnMaxIdle, clientMaxStreams, tlsWrap, c.LANMembers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make this above and set it in the structure as connPool: connPool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pass in the LANMembers function from the client as a callback, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the comments below - I think we can move the "should use TLS" logic onto https://github.com/hashicorp/consul/blob/master/consul/agent/server.go and then just pass the result of that into dial and friends.
consul/client.go
Outdated
@@ -150,6 +151,10 @@ func (c *Client) setupSerf(conf *serf.Config, ch chan serf.Event, path string) ( | |||
conf.Tags["vsn_min"] = fmt.Sprintf("%d", ProtocolVersionMin) | |||
conf.Tags["vsn_max"] = fmt.Sprintf("%d", ProtocolVersionMax) | |||
conf.Tags["build"] = c.config.Build | |||
conf.Tags["rpc_port"] = fmt.Sprintf("%d", c.config.RPCAddr.Port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't ever inbound connect to a client agent, right? We probably don't need this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's right, I thought the keyring or event stuff might use it, but those both use serf to gossip out, so we can remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - they don't have an inbound listener at this time.
consul/pool.go
Outdated
} | ||
|
||
return conn, hc, nil | ||
} | ||
|
||
// memberExpectsTLS returns true if the member with the specified address is | ||
// verifying incoming TLS connections | ||
func memberExpectsTLS(addr string, members []serf.Member) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty expensive since it's scanning all the members. In other methods where we use the connPool
, we actually have an agent.Server
, which is essentially the Serf info - see https://github.com/hashicorp/consul/blob/master/consul/rpc.go#L259-L266. We could probably put the TLS logic in there, and pass that through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like as a useTLS
Boolean maybe.
consul/server.go
Outdated
@@ -379,6 +380,7 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w | |||
conf.Tags["raft_vsn"] = fmt.Sprintf("%d", s.config.RaftConfig.ProtocolVersion) | |||
conf.Tags["build"] = s.config.Build | |||
conf.Tags["port"] = fmt.Sprintf("%d", addr.Port) | |||
conf.Tags["rpc_port"] = fmt.Sprintf("%d", s.config.RPCAddr.Port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same as the existing port
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing tag has the LAN gossip port, not the RPC port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - it gets populated from the RPC listener above (sanity check on https://demo.consul.io/v1/agent/self?pretty shows 8300).
consul/server.go
Outdated
@@ -388,6 +390,9 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w | |||
if s.config.NonVoter { | |||
conf.Tags["nonvoter"] = "1" | |||
} | |||
if s.config.VerifyIncoming { | |||
conf.Tags["tls_verify_incoming"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to key this off of certs being configured and have a sense more like use_tls
? During the transition we will have to have verify turned off, but the server set up to handle incoming TLS connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this can be more permissive since we only really need to avoid trying to do outgoing TLS to an agent that doesn't have it configured at all
Put some comments on the review - I think we need to work through the overall approach a little bit more and then do another round. |
consul/client.go
Outdated
eventCh: make(chan serf.Event, serfEventBacklog), | ||
logger: logger, | ||
shutdownCh: make(chan struct{}), | ||
} | ||
|
||
c.connPool = NewPool(config.LogOutput, clientRPCConnMaxIdle, clientMaxStreams, tlsWrap, c.LANMembers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
consul/pool.go
Outdated
@@ -151,11 +157,13 @@ type ConnPool struct { | |||
// Set maxTime to 0 to disable reaping. maxStreams is used to control | |||
// the number of idle streams allowed. | |||
// If TLS settings are provided outgoing connections use TLS. | |||
func NewPool(logOutput io.Writer, maxTime time.Duration, maxStreams int, tlsWrap tlsutil.DCWrapper) *ConnPool { | |||
func NewPool(logOutput io.Writer, maxTime time.Duration, maxStreams int, tlsWrap tlsutil.DCWrapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single line pls.
consul/pool.go
Outdated
// Switch the connection into TLS mode | ||
if _, err := conn.Write([]byte{byte(rpcTLS)}); err != nil { | ||
conn.Close() | ||
doWrap, err := memberExpectsTLS(addr.String(), p.memberFunc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does a lot of magic. Please add a comment explaining what happens here and why.
consul/raft_rpc.go
Outdated
// Switch the connection into TLS mode | ||
if _, err := conn.Write([]byte{byte(rpcTLS)}); err != nil { | ||
conn.Close() | ||
doWrap, err := memberExpectsTLS(string(address), l.memberFunc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks identical to the previous code. helper fn?
consul/server.go
Outdated
@@ -270,6 +269,8 @@ func NewServer(config *Config) (*Server, error) { | |||
shutdownCh: make(chan struct{}), | |||
} | |||
|
|||
s.connPool = NewPool(config.LogOutput, serverRPCCache, serverMaxStreams, tlsWrap, s.LANMembers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with James on that one. Create first, then inject.
One other thing - will this take away TLS until everybody is upgraded (to have the proper tag)? I wonder if we do need that config after all (you'd still want the tag because it makes the upgrade seamless), but it seems like we need to turn this on to not break existing behavior. That, or we could add the tag now and enable the actual switching part in pool.go in a later version of Consul. |
We could set |
Or would missing tag == 1 be equivalent to now, even? If you have TLS configured it will try to use TLS. The mixed mode business only works if your whole cluster is upgraded. |
Would be good to add some new tests as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the logic in server.go should make this a lot more efficient and flows the information down in a good way. Put some open questions on here, and we also need to update the docs.
consul/agent/server.go
Outdated
@@ -40,6 +40,7 @@ type Server struct { | |||
NonVoter bool | |||
Addr net.Addr | |||
Status serf.MemberStatus | |||
UseTLS bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth some documentation about how it should be used.
s.localLock.RUnlock() | ||
|
||
if !ok { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be safer as return true
, as if we don't know anything about the tag and defer to whether TLS is configured our not the old way (based on tlsWrap
). Does that seem right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that defaulting to true (for TLS on) seems better, I thought I had it that way but must have changed it.
consul/agent/server.go
Outdated
@@ -72,6 +73,9 @@ func IsConsulServer(m serf.Member) (bool, *Server) { | |||
datacenter := m.Tags["dc"] | |||
_, bootstrap := m.Tags["bootstrap"] | |||
|
|||
tlsTag, ok := m.Tags["use_tls"] | |||
useTLS := !ok || tlsTag == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth adding a few lines in the unit test.
consul/server_test.go
Outdated
defer os.RemoveAll(dir2) | ||
defer s2.Shutdown() | ||
|
||
// Try to join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can convert this to use the new test helpers and retry stuff - that landed this morning.
c.CAFile = "../test/client_certs/rootca.crt" | ||
c.CertFile = "../test/client_certs/server.crt" | ||
c.KeyFile = "../test/client_certs/server.key" | ||
c.VerifyOutgoing = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - shouldn't this fail if you have verify_outgoing
set and the other end doesn't support TLS? It seems like during migration you'd need to turn this off until everything was upgraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify_outgoing
needs to work for servers that don't have TLS on, otherwise we would need 2 intermediary steps instead of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you'd leave both verify settings off, but configure the CA everywhere, then in a final step turn verify on. It seems like if you have verify_outgoing
on, you really don't want to connect to anyone insecurely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you turn them both on at once with no interop, servers with verify_incoming
will reject connections from the servers that don't have verify_outgoing
set.
command/agent/agent.go
Outdated
if a.config.CAPath != "" || a.config.CAFile != "" { | ||
base.VerifyOutgoing = true | ||
} | ||
base.ForceVerifyOutgoing = a.config.VerifyOutgoing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of remapping VerifyOutgoing
to ForceVerifyOutgoing
how about keeping the sense of VerifyOutgoing
the same and adding something like UseTLS
?
|
||
## Configuring TLS on an existing cluster | ||
|
||
As of version 0.8.2, Consul supports migrating to TLS-encrypted traffic on a running cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.8.3
consul/server.go
Outdated
@@ -393,6 +393,9 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w | |||
if s.config.NonVoter { | |||
conf.Tags["nonvoter"] = "1" | |||
} | |||
if s.config.CAFile != "" || s.config.CAPath != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could key off of the s.config.UseTLS
as well (see comment above).
dd7433f
to
ac0f187
Compare
ac0f187
to
250008b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted some small stuff, but otherwise this is looking like it's almost there!
retry.Run(t, func(r *retry.R) { r.Check(wantPeers(s1, 2)) }) | ||
|
||
// Have s2 make an RPC call to s1 | ||
s2.localLock.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to unlock this.
|
||
testrpc.WaitForLeader(t, s1.RPC, "dc1") | ||
|
||
// Add a second server with TLS and VerifyOutgoing set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment.
consul/server_test.go
Outdated
c.KeyFile = "../test/client_certs/server.key" | ||
c.VerifyIncoming = true | ||
c.VerifyOutgoing = true | ||
c.UseTLS = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be set automatically, right?
|
||
testrpc.WaitForLeader(t, s1.RPC, "dc1") | ||
|
||
// Add a second server with TLS and VerifyOutgoing set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment.
consul/server_test.go
Outdated
c.CAFile = "../test/client_certs/rootca.crt" | ||
c.CertFile = "../test/client_certs/server.crt" | ||
c.KeyFile = "../test/client_certs/server.key" | ||
c.UseTLS = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need this, actually, since setting the CA will set this.
tlsutil/config.go
Outdated
@@ -49,6 +49,10 @@ type Config struct { | |||
// existing clients. | |||
VerifyServerHostname bool | |||
|
|||
// UseTLS is used to enable outgoing TLS verification of all outgoing connections to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little misleading with the word "verification". I'd just say "UseTLS enables outgoing TLS connections to Consul servers."
consul/server_test.go
Outdated
c.CAFile = "../test/client_certs/rootca.crt" | ||
c.CertFile = "../test/client_certs/server.crt" | ||
c.KeyFile = "../test/client_certs/server.key" | ||
c.UseTLS = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need to set this manually.
consul/server.go
Outdated
s.localLock.RUnlock() | ||
|
||
if !ok { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should flip back to false
with this new implementation. The verify_outgoing
forces it, otherwise if we don't know anything about the other server then we probably don't want to use TLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (I tweaked one comment)
Fixes #1705 by allowing a graceful transition from a non-TLS configured cluster to a TLS one.