Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Planned Reparents with Orchestrator #6840

Closed
deepthi opened this issue Oct 8, 2020 · 2 comments
Closed

RFC: Planned Reparents with Orchestrator #6840

deepthi opened this issue Oct 8, 2020 · 2 comments

Comments

@deepthi
Copy link
Member

deepthi commented Oct 8, 2020

Background

Vitess currently has a vtctl command (PlannedReparentShard aka PRS) that allows for planned master failovers. Conceptually this is the same as Orchestrator’s GracefulMasterTakeover functionality.
This RFC proposes a rewrite of Orchestrator’s GracefulMasterTakeover that achieves the following goals

  • Provides the same guarantees as PRS but with durability being configurable / pluggable
  • Does not require a Raft cluster of orchestrator instances to work in a multi-cell deployment
  • Makes progress on a path towards orchestrator becoming a part of vttablet
  • Follows the principles laid out in this blog post

Assumptions

Current master is known and reachable, otherwise PRS fails.
Global topo should be available for PRS to succeed.
We will use LockShard to continue to limit the possible universe of race conditions.

Vitess PRS

The current implementation of PRS in Vitess has the following steps once we know which tablet is the chosen / new master.

  • LockShard so that cluster topology doesn’t change while PRS is in progress
  • Ensure candidate master is replicating from current master
  • Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
  • Wait for candidate master to catch up
  • Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
  • Point all replicas (including old master) to replicate from new master

Orchestrator GracefulMasterTakeover

The existing functionality is more complex than the Vitess PRS. One of the reasons is because Orchestrator handles corner cases that Vitess components don’t have to run into (for example, hierarchical replication). Additionally, Orchestrator attempts to reuse the deadmaster code, which is more conservative than necessary. If the current master is known and authoritative, there is no need to go through the sanity checks that deadmaster has to go through.

Proposed Solution

The implementation of PRS in Orchestrator will be very similar to the current implementation of the vtctl command with some modifications.

  • We have to ensure that all orchestrator instances see the same view of the topology. In order to guarantee this, we will rely on LockShard for now.
  • Orchestrator will reload all tablet records to confirm that the master has not changed after shard record was locked.
  • Ensure “avoid master” is still the master.
  • Ensure candidate master is replicating from current master
  • Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
  • Wait for candidate master to catch up
  • Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
  • Point all replicas (including old master) to replicate from new master
  • When a new REPLICA tablet joins the topology, it will take the shard lock before attempting to publish its tablet record. publishState tries indefinitely until it succeeds.

Failure Modes

In case of failure, we can depend on the information in the topo for knowing who the current master is. This is because a tablet record is updated as the new master only after it has caught up on all events of the previous master.

All other actions of PRS are idempotent.

@shlomi-noach
Copy link
Contributor

This makes sense.

@deepthi
Copy link
Member Author

deepthi commented Mar 31, 2022

The vtorc work has made this issue redundant. See #8975

@deepthi deepthi closed this as completed Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants