Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Recovery: relaxed promotion rule check searching for ideal replica #1222

Merged
merged 2 commits into from
Jul 28, 2020

Conversation

shlomi-noach
Copy link
Collaborator

The current recovery logic has a mechanism to optimize recovery time on a dead master scenario:

  1. If replicas are probed
  2. And one is picked as candidate to be promoted (based on binlogs, version, configuraiton)
  3. And it's in the same DC and environment as the dead master
  4. And it has prefer (or must, which is yet unsupported) promotion rule

then, we we consider the replica as "ideal", immediately promote it as master, and asynchronously point the rest of replicas below it.

While analyzing a recent production use case, I noticed a setup where all servers were either marked as neutral or must_not. None was marked with prefer. In that scenario, the recovery was not optimized.

In this PR:

  • as long as the server is not marked with must_not promotion rule
  • if it has the best promotion rule from among the replicas it would own after promotion, then it's an "ideal" replica.

@shlomi-noach shlomi-noach merged commit fc567d8 into master Jul 28, 2020
@shlomi-noach shlomi-noach deleted the best-promotion-rule-candidate-replica branch July 28, 2020 07:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant