-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait to initialize autopilot until all servers are >= 0.8.0 #2897
Conversation
45efc53
to
7d36403
Compare
consul/autopilot.go
Outdated
@@ -33,6 +34,8 @@ func (s *Server) stopAutopilot() { | |||
s.autopilotWaitGroup.Wait() | |||
} | |||
|
|||
var minAutopilotVersion, _ = version.NewVersion("0.8.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use version.Must(version.NewVersion())
here - can you update the other constant versions in AP to do that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way things will blow up if there's an error. Unlikely to ever happen but an error into an _
feels like it'll bite us some day.
consul/autopilot.go
Outdated
// pruneDeadServers removes up to numPeers/2 failed servers | ||
func (s *Server) pruneDeadServers() error { | ||
state := s.fsm.State() | ||
_, autopilotConf, err := state.AutopilotConfig() | ||
if err != nil { | ||
if err != nil || autopilotConf == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to have this down here. I'd change pruneDeadServers
to take the config as an argument like PromoteNonVoters
and then make a helper that gets called in both paths. Since initializeAutopilot
also fetches the state I think you can move its logic here and consolidate into one function:
func (s *Server) getAutopilotConfig() (*structs.AutopilotConfig, bool) {
state := s.fsm.State()
_, config, err := state.AutopilotConfig()
if err != nil {
s.logger.Printf("[ERR] autopilot: Error retrieving config from state store: %v", err)
return nil, false
}
if config != nil {
return config, true
}
// do the version check here and possibly return nil, false
config = *s.config.AutopilotConfig
req := structs.AutopilotSetConfigRequest{Config:config}
if _, err = s.raftApply(structs.AutopilotRequestType, req); err != nil {
s.logger.Printf("[ERR] autopilot: Failed to initialize autopilot config: %v", err)
return nil, false
}
return config, true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then in the switch it's:
autopilotConfig, ok := s.getAutopilotConfig()
if !ok {
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually maybe getOrCreateAutpilotConfig
so it's clear it has a side-effect.
consul/autopilot.go
Outdated
@@ -68,11 +84,26 @@ func (s *Server) autopilotLoop() { | |||
} | |||
} | |||
|
|||
// lowestServerVersion returns the lowest version among the alive servers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd factor this out into a companion to https://github.com/hashicorp/consul/blob/master/consul/util.go#L65-L92, like AreServersAtLeastOnVersion
. Or, since have util depend on consul/server might be weird you could move them both into a new Go file in the Consul package like "compatibility.go" so they live side-by-side.
@@ -153,12 +153,6 @@ func (s *Server) establishLeadership() error { | |||
return err | |||
} | |||
|
|||
// Setup autopilot config if we are the leader and need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this endpoint will crash if the AP config hasn't been made yet:
https://github.com/hashicorp/consul/blob/master/consul/operator_autopilot_endpoint.go#L24-L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ that endpoint still needs an update (it can call get or create prolly and return an error if it can't get a config)
consul/leader.go
Outdated
func (s *Server) initializeAutopilot() error { | ||
// Bail if the config has already been initialized | ||
// getAutopilotConfig is used to get the autopilot config, initializing it if necessary | ||
func (s *Server) getAutopilotConfig() (*structs.AutopilotConfig, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of getOrCreateAutpilotConfig
?
consul/leader.go
Outdated
req := structs.AutopilotSetConfigRequest{ | ||
Config: *s.config.AutopilotConfig, | ||
if !ServersMeetMinimumVersion(s.LANMembers(), minAutopilotVersion) { | ||
s.logger.Printf("can't initialize autopilot until all servers are >= %s", minAutopilotVersion.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should have the autopilot
prefix added.
consul/leader.go
Outdated
if _, err = s.raftApply(structs.AutopilotRequestType, req); err != nil { | ||
return fmt.Errorf("failed to initialize autopilot config") | ||
s.logger.Printf("failed to initialize autopilot config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the error onto the end of this one.
|
||
// ServersMeetMinimumVersion returns whether the given alive servers are at least on the | ||
// given Consul version | ||
func ServersMeetMinimumVersion(members []serf.Member, minVersion *version.Version) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good to add a test for.
@@ -33,6 +34,8 @@ func (s *Server) stopAutopilot() { | |||
s.autopilotWaitGroup.Wait() | |||
} | |||
|
|||
var minAutopilotVersion = version.Must(version.NewVersion("0.8.0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the other NewVersion
calls to use this pattern too?
c4a67fe
to
196dd81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I'd bump the log level for the one down to a WARN and the operator AP endpoint needs a nil
check, then I think we are good.
consul/leader.go
Outdated
req := structs.AutopilotSetConfigRequest{ | ||
Config: *s.config.AutopilotConfig, | ||
if !ServersMeetMinimumVersion(s.LANMembers(), minAutopilotVersion) { | ||
s.logger.Printf("[ERR] autopilot: can't initialize until all servers are >= %s", minAutopilotVersion.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a WARN.
No description provided.