Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Analysis: locked semi sync master #1175

Merged
merged 19 commits into from
May 30, 2020
Merged

Conversation

shlomi-noach
Copy link
Collaborator

@shlomi-noach shlomi-noach commented May 25, 2020

Replaces #1173
A different approach to detecting LockedSemiSyncMaster

@shlomi-noach shlomi-noach changed the title Analysis locked semi sync master Analysis: locked semi sync master May 27, 2020
@shlomi-noach shlomi-noach marked this pull request as ready for review May 27, 2020 10:25
@shlomi-noach
Copy link
Collaborator Author

This PR introduces the LockedSemiSyncMaster analysis:

  • When a master has semi-sync enabled
  • And when it sees insufficient semi-sync replicas (Rpl_semi_sync_master_clients < rpl_semi_sync_master_wait_for_slave_count)
  • And it does not fall back to async (rpl_semi_sync_master_timeout is high)
  • And more than config.Config.ReasonableReplicationLagSeconds passes

then the master is considered to be LockedSemiSyncMaster: it cannot accept writes, and that's a failure detection scenario.

When first three conditions are met, and 4th is not met yet, the analysis is LockedSemiSyncMasterHypothesis. While it shows on replication-analysis command, it is non-actionable, and merely serves as preparation step for LockedSemiSyncMaster.

Analyzing a LockedSemiSyncMaster scenario is different than all other current analysis, because it takes time to make the analysis. MySQL does not provide the mechanism to say "this master has been locked for the past 15 seconds". orchestrator needs to figure this out reliably.

Right now this PR offers no failover hooks for this analysis.

@shlomi-noach shlomi-noach merged commit e5225ce into master May 30, 2020
@shlomi-noach shlomi-noach deleted the analysis-locked-semi-sync-master branch May 30, 2020 14:55
@earl86
Copy link

earl86 commented Jul 9, 2020

Can it auto disable semi-sync on master, when orchestrator find LockedSemiSyncMaster?

@shlomi-noach
Copy link
Collaborator Author

Can it auto disable semi-sync on master, when orchestrator find LockedSemiSyncMaster?

As mentioned above, right now this PR offers no failover hooks for this analysis. Followup work will offer taking action, and this is still in the works. Action might be:

  • invoke special hooks?
  • disable semi-sync on master?
  • enable semi-sync on another replica?
  • failover master AND enable semi sync on another replica? (avoid split brain scenario)

There can be many rules: e.g. only enable semi sync in a replica in same dc/same region/different dc/different region ; only provision up the number of semi-sync replicas / provision up and down to match exact expectation / etc...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants