Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait to initialize autopilot until all servers are >= 0.8.0 #2897

Merged
merged 4 commits into from
Apr 13, 2017

Conversation

kyhavlov
Copy link
Contributor

No description provided.

@kyhavlov kyhavlov force-pushed the autopilot-config-fix branch from 45efc53 to 7d36403 Compare April 12, 2017 23:05
@@ -33,6 +34,8 @@ func (s *Server) stopAutopilot() {
s.autopilotWaitGroup.Wait()
}

var minAutopilotVersion, _ = version.NewVersion("0.8.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use version.Must(version.NewVersion()) here - can you update the other constant versions in AP to do that as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way things will blow up if there's an error. Unlikely to ever happen but an error into an _ feels like it'll bite us some day.

// pruneDeadServers removes up to numPeers/2 failed servers
func (s *Server) pruneDeadServers() error {
state := s.fsm.State()
_, autopilotConf, err := state.AutopilotConfig()
if err != nil {
if err != nil || autopilotConf == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to have this down here. I'd change pruneDeadServers to take the config as an argument like PromoteNonVoters and then make a helper that gets called in both paths. Since initializeAutopilot also fetches the state I think you can move its logic here and consolidate into one function:

func (s *Server) getAutopilotConfig() (*structs.AutopilotConfig, bool) {
    state := s.fsm.State()
    _, config, err := state.AutopilotConfig()
    if err != nil {
        s.logger.Printf("[ERR] autopilot: Error retrieving config from state store: %v", err)
        return nil, false
    }
    if config != nil {
        return config, true
    }

    // do the version check here and possibly return nil, false

    config = *s.config.AutopilotConfig
    req := structs.AutopilotSetConfigRequest{Config:config}
    if _, err = s.raftApply(structs.AutopilotRequestType, req); err != nil {
        s.logger.Printf("[ERR] autopilot: Failed to initialize autopilot config: %v", err)
       return nil, false
    }

    return config, true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then in the switch it's:

autopilotConfig, ok := s.getAutopilotConfig()
if !ok {
    continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe getOrCreateAutpilotConfig so it's clear it has a side-effect.

@@ -68,11 +84,26 @@ func (s *Server) autopilotLoop() {
}
}

// lowestServerVersion returns the lowest version among the alive servers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd factor this out into a companion to https://github.com/hashicorp/consul/blob/master/consul/util.go#L65-L92, like AreServersAtLeastOnVersion. Or, since have util depend on consul/server might be weird you could move them both into a new Go file in the Consul package like "compatibility.go" so they live side-by-side.

@@ -153,12 +153,6 @@ func (s *Server) establishLeadership() error {
return err
}

// Setup autopilot config if we are the leader and need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this endpoint will crash if the AP config hasn't been made yet:

https://github.com/hashicorp/consul/blob/master/consul/operator_autopilot_endpoint.go#L24-L30

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ that endpoint still needs an update (it can call get or create prolly and return an error if it can't get a config)

consul/leader.go Outdated
func (s *Server) initializeAutopilot() error {
// Bail if the config has already been initialized
// getAutopilotConfig is used to get the autopilot config, initializing it if necessary
func (s *Server) getAutopilotConfig() (*structs.AutopilotConfig, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of getOrCreateAutpilotConfig?

consul/leader.go Outdated
req := structs.AutopilotSetConfigRequest{
Config: *s.config.AutopilotConfig,
if !ServersMeetMinimumVersion(s.LANMembers(), minAutopilotVersion) {
s.logger.Printf("can't initialize autopilot until all servers are >= %s", minAutopilotVersion.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should have the autopilot prefix added.

consul/leader.go Outdated
if _, err = s.raftApply(structs.AutopilotRequestType, req); err != nil {
return fmt.Errorf("failed to initialize autopilot config")
s.logger.Printf("failed to initialize autopilot config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the error onto the end of this one.


// ServersMeetMinimumVersion returns whether the given alive servers are at least on the
// given Consul version
func ServersMeetMinimumVersion(members []serf.Member, minVersion *version.Version) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be good to add a test for.

@@ -33,6 +34,8 @@ func (s *Server) stopAutopilot() {
s.autopilotWaitGroup.Wait()
}

var minAutopilotVersion = version.Must(version.NewVersion("0.8.0"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the other NewVersion calls to use this pattern too?

@kyhavlov kyhavlov force-pushed the autopilot-config-fix branch from c4a67fe to 196dd81 Compare April 13, 2017 01:39
Copy link
Contributor

@slackpad slackpad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I'd bump the log level for the one down to a WARN and the operator AP endpoint needs a nil check, then I think we are good.

consul/leader.go Outdated
req := structs.AutopilotSetConfigRequest{
Config: *s.config.AutopilotConfig,
if !ServersMeetMinimumVersion(s.LANMembers(), minAutopilotVersion) {
s.logger.Printf("[ERR] autopilot: can't initialize until all servers are >= %s", minAutopilotVersion.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a WARN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants