RFC: Planned Reparents with Orchestrator #6840

deepthi · 2020-10-08T17:16:25Z

Background

Vitess currently has a vtctl command (PlannedReparentShard aka PRS) that allows for planned master failovers. Conceptually this is the same as Orchestrator’s GracefulMasterTakeover functionality.
This RFC proposes a rewrite of Orchestrator’s GracefulMasterTakeover that achieves the following goals

Provides the same guarantees as PRS but with durability being configurable / pluggable
Does not require a Raft cluster of orchestrator instances to work in a multi-cell deployment
Makes progress on a path towards orchestrator becoming a part of vttablet
Follows the principles laid out in this blog post

Assumptions

Current master is known and reachable, otherwise PRS fails.
Global topo should be available for PRS to succeed.
We will use LockShard to continue to limit the possible universe of race conditions.

Vitess PRS

The current implementation of PRS in Vitess has the following steps once we know which tablet is the chosen / new master.

LockShard so that cluster topology doesn’t change while PRS is in progress
Ensure candidate master is replicating from current master
Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
Wait for candidate master to catch up
Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
Point all replicas (including old master) to replicate from new master

Orchestrator GracefulMasterTakeover

The existing functionality is more complex than the Vitess PRS. One of the reasons is because Orchestrator handles corner cases that Vitess components don’t have to run into (for example, hierarchical replication). Additionally, Orchestrator attempts to reuse the deadmaster code, which is more conservative than necessary. If the current master is known and authoritative, there is no need to go through the sanity checks that deadmaster has to go through.

Proposed Solution

The implementation of PRS in Orchestrator will be very similar to the current implementation of the vtctl command with some modifications.

We have to ensure that all orchestrator instances see the same view of the topology. In order to guarantee this, we will rely on LockShard for now.
Orchestrator will reload all tablet records to confirm that the master has not changed after shard record was locked.
Ensure “avoid master” is still the master.
Ensure candidate master is replicating from current master
Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
Wait for candidate master to catch up
Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
Point all replicas (including old master) to replicate from new master
When a new REPLICA tablet joins the topology, it will take the shard lock before attempting to publish its tablet record. publishState tries indefinitely until it succeeds.

Failure Modes

In case of failure, we can depend on the information in the topo for knowing who the current master is. This is because a tablet record is updated as the new master only after it has caught up on all events of the previous master.

All other actions of PRS are idempotent.

The text was updated successfully, but these errors were encountered:

shlomi-noach · 2020-10-13T09:39:12Z

This makes sense.

deepthi · 2022-03-31T00:03:06Z

The vtorc work has made this issue redundant. See #8975

deepthi assigned deepthi and sougou Oct 8, 2020

systay added the Type: RFC Request For Comment label Oct 14, 2020

sougou added Component: Cluster management P2 labels Nov 13, 2020

deepthi unassigned deepthi and sougou Mar 1, 2021

deepthi closed this as completed Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Planned Reparents with Orchestrator #6840

RFC: Planned Reparents with Orchestrator #6840

deepthi commented Oct 8, 2020 •

edited

Loading

shlomi-noach commented Oct 13, 2020

deepthi commented Mar 31, 2022

RFC: Planned Reparents with Orchestrator #6840

RFC: Planned Reparents with Orchestrator #6840

Comments

deepthi commented Oct 8, 2020 • edited Loading

Background

Assumptions

Vitess PRS

Orchestrator GracefulMasterTakeover

Proposed Solution

Failure Modes

shlomi-noach commented Oct 13, 2020

deepthi commented Mar 31, 2022

deepthi commented Oct 8, 2020 •

edited

Loading