You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vitess currently has a vtctl command (PlannedReparentShard aka PRS) that allows for planned master failovers. Conceptually this is the same as Orchestrator’s GracefulMasterTakeover functionality.
This RFC proposes a rewrite of Orchestrator’s GracefulMasterTakeover that achieves the following goals
Provides the same guarantees as PRS but with durability being configurable / pluggable
Does not require a Raft cluster of orchestrator instances to work in a multi-cell deployment
Makes progress on a path towards orchestrator becoming a part of vttablet
Current master is known and reachable, otherwise PRS fails.
Global topo should be available for PRS to succeed.
We will use LockShard to continue to limit the possible universe of race conditions.
Vitess PRS
The current implementation of PRS in Vitess has the following steps once we know which tablet is the chosen / new master.
LockShard so that cluster topology doesn’t change while PRS is in progress
Ensure candidate master is replicating from current master
Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
Wait for candidate master to catch up
Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
Point all replicas (including old master) to replicate from new master
Orchestrator GracefulMasterTakeover
The existing functionality is more complex than the Vitess PRS. One of the reasons is because Orchestrator handles corner cases that Vitess components don’t have to run into (for example, hierarchical replication). Additionally, Orchestrator attempts to reuse the deadmaster code, which is more conservative than necessary. If the current master is known and authoritative, there is no need to go through the sanity checks that deadmaster has to go through.
Proposed Solution
The implementation of PRS in Orchestrator will be very similar to the current implementation of the vtctl command with some modifications.
We have to ensure that all orchestrator instances see the same view of the topology. In order to guarantee this, we will rely on LockShard for now.
Orchestrator will reload all tablet records to confirm that the master has not changed after shard record was locked.
Ensure “avoid master” is still the master.
Ensure candidate master is replicating from current master
Demote current master - this only means that the mysql is now read-only. Tablet type is still MASTER
Wait for candidate master to catch up
Promote candidate to master - set tablet_type to MASTER and mysql to read-write.
Point all replicas (including old master) to replicate from new master
When a new REPLICA tablet joins the topology, it will take the shard lock before attempting to publish its tablet record. publishState tries indefinitely until it succeeds.
Failure Modes
In case of failure, we can depend on the information in the topo for knowing who the current master is. This is because a tablet record is updated as the new master only after it has caught up on all events of the previous master.
All other actions of PRS are idempotent.
The text was updated successfully, but these errors were encountered:
Background
Vitess currently has a vtctl command (
PlannedReparentShard
akaPRS
) that allows for planned master failovers. Conceptually this is the same as Orchestrator’sGracefulMasterTakeover
functionality.This RFC proposes a rewrite of Orchestrator’s
GracefulMasterTakeover
that achieves the following goalsPRS
but with durability being configurable / pluggableAssumptions
Current master is known and reachable, otherwise PRS fails.
Global topo should be available for PRS to succeed.
We will use
LockShard
to continue to limit the possible universe of race conditions.Vitess PRS
The current implementation of PRS in Vitess has the following steps once we know which tablet is the chosen / new master.
Orchestrator GracefulMasterTakeover
The existing functionality is more complex than the Vitess PRS. One of the reasons is because Orchestrator handles corner cases that Vitess components don’t have to run into (for example, hierarchical replication). Additionally, Orchestrator attempts to reuse the deadmaster code, which is more conservative than necessary. If the current master is known and authoritative, there is no need to go through the sanity checks that deadmaster has to go through.
Proposed Solution
The implementation of PRS in Orchestrator will be very similar to the current implementation of the vtctl command with some modifications.
publishState
tries indefinitely until it succeeds.Failure Modes
In case of failure, we can depend on the information in the topo for knowing who the current master is. This is because a tablet record is updated as the new master only after it has caught up on all events of the previous master.
All other actions of PRS are idempotent.
The text was updated successfully, but these errors were encountered: