-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows disabling WAN federation by setting serf WAN port to -1 #3984
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to dig into all the references for things that might become nil now and every trail ended in something you checked here so I don't see any obvious bugs.
I left some questions inline about whether there is a good reason not to return errors when WAN is disabled and we try to change WAN related things in a few places, but you have more context on how they are used . In general where it doesn't have a direct negative effect on code or UX that would need lots of extra work to undo I prefer to explicitly fail rather than silently skip stuff in cases like this.
agent/consul/server.go
Outdated
@@ -1021,6 +1047,10 @@ func (s *Server) GetLANCoordinate() (lib.CoordinateSet, error) { | |||
|
|||
// GetWANCoordinate returns the coordinate of the server in the WAN gossip pool. | |||
func (s *Server) GetWANCoordinate() (*coordinate.Coordinate, error) { | |||
if s.serfWAN == nil { | |||
// Return zero values if WAN federation is disabled | |||
return &coordinate.Coordinate{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right thing to do vs ErrWANFederationDisabled
may well be as I'm not familiar with the ways it is called but it's not obvious to me why we'd want to pretend this still works when WAN is disabled.
agent/router/router.go
Outdated
@@ -116,6 +116,9 @@ func (r *Router) Shutdown() { | |||
|
|||
// AddArea registers a new network area with the router. | |||
func (r *Router) AddArea(areaID types.AreaID, cluster RouterSerfCluster, pinger Pinger, useTLS bool) error { | |||
if cluster == nil { | |||
return nil | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is returning an error appropriate here given that we guard the call in NewServerLogger
when cluster is nil anyway? Maybe there are other callers that would fail if it were an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to be defensive here but its better to panic if there are new call sites added for this method in the future, will remove
@@ -37,7 +37,9 @@ func (s *Server) setupSerf(conf *serf.Config, ch chan serf.Event, path string, w | |||
conf.NodeName = fmt.Sprintf("%s.%s", s.config.NodeName, s.config.Datacenter) | |||
} else { | |||
conf.NodeName = s.config.NodeName | |||
conf.Tags["wan_join_port"] = fmt.Sprintf("%d", wanPort) | |||
if wanPort > 0 { | |||
conf.Tags["wan_join_port"] = fmt.Sprintf("%d", wanPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I checked and it seems the only consumer of that tag in our code already handles the case where it's not present:
consul/agent/metadata/server.go
Lines 119 to 125 in 7de57ba
wanJoinPortStr, ok := m.Tags["wan_join_port"] | |
if ok { | |
wanJoinPort, err = strconv.Atoi(wanJoinPortStr) | |
if err != nil { | |
return false, nil | |
} | |
} |
agent/consul/server.go
Outdated
@@ -872,12 +889,19 @@ func (s *Server) KeyManagerLAN() *serf.KeyManager { | |||
|
|||
// KeyManagerWAN returns the WAN Serf keyring manager | |||
func (s *Server) KeyManagerWAN() *serf.KeyManager { | |||
if s.serfWAN == nil { | |||
return nil | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record I verified all call paths to this to check none will panic on a nil. It's only called from an internal state machine routine when working with WAN keyrings which should be disabled by code in this PR.
consul/agent/consul/internal_endpoint.go
Lines 143 to 151 in 17681f0
// executeKeyringOp executes the keyring-related operation in the request | |
// on either the WAN or LAN pools. | |
func (m *Internal) executeKeyringOp( | |
args *structs.KeyringRequest, | |
reply *structs.KeyringResponses, | |
wan bool) { | |
if wan { | |
mgr := m.srv.KeyManagerWAN() |
called from
consul/agent/consul/internal_endpoint.go
Lines 131 to 136 in 17681f0
// Only perform WAN keyring querying and RPC forwarding once | |
if !args.Forwarded { | |
args.Forwarded = true | |
m.executeKeyringOp(args, reply, true) | |
return m.srv.globalRPC("Internal.KeyringOperation", args, reply) | |
} |
which is disabled above.
In CI looks suspiciously like it might be legit given the change but I think you said you had them passing locally so I hit retry just to see if it fails consistently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few comments 👍
agent/agent.go
Outdated
@@ -1019,6 +1024,7 @@ func (a *Agent) setupNodeID(config *config.RuntimeConfig) error { | |||
func (a *Agent) setupBaseKeyrings(config *consul.Config) error { | |||
// If the keyring file is disabled then just poke the provided key | |||
// into the in-memory keyring. | |||
fedarationEnabled := config.SerfWANConfig != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "federationEnabled"
agent/consul/server.go
Outdated
} | ||
|
||
// Initialize the LAN segments before the default LAN Serf so we have | ||
// updated port information to publish there. | ||
// TODO preetha: why is this passing WAN port to create segments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like just a remnant of calling the same setupSerf function for segments; the servers are already in the default serf LAN before any segments start, so I don't think the segment Serfs need the WAN port tag set.
// GetWANCoordinate returns the coordinate of the server in the WAN gossip pool. | ||
func (s *Server) GetWANCoordinate() (*coordinate.Coordinate, error) { | ||
return s.serfWAN.GetCoordinate() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this not used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
See #3984 for more. This also fixes a link to a heading that existed twice in the document.
This PR enables disabling WAN federation by setting the serf WAN port to -1 in the config. A number of endpoints/features are affected by this, listed below in no particular order.
consul catalog datacenters
and the equivalent HTTP API returns just the local dc nameconsul members -wan
and the equivalent HTTP API returns an empty listconsul join -wan
returns an errorv1/coordinate/datacenters
returns empty listconsul keyring
only shows LAN relevant keyrings..
Fixes Allow disabling WAN federation by specifying -1 as the serft WAN port #3989