Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base: enable Raft CheckQuorum by default #104042

Merged
merged 1 commit into from
Jun 28, 2023

Commits on Jun 27, 2023

  1. base: enable Raft CheckQuorum by default

    This patch enables Raft CheckQuorum by default. In etcd/raft, this also
    has the effect of fully enabling PreVote, such that followers won't
    grant prevotes if they've heard from a leader in the past election
    timeout interval.
    
    This is more robust against partial and asymmetric network partitions.
    Otherwise, a partitioned node may be able to hold spurious elections and
    steal leadership away from an established leader. This can cause the
    leader to become unreachable by the leaseholder, resulting in permanent
    range unavailability.
    
    We are still able to hold immediate elections, e.g. when unquiescing a
    range to find a dead leader. If a quorum of followers consider the
    leader dead and forget it (becoming leaderless followers), they will
    grant prevotes despite having seen the leader recently (i.e. before
    quiescing), and can hold an election immediately.
    
    This is compatibile with 23.1 in mixed-version clusters:
    
    * Leaders with mixed `CheckQuorum` settings is fine: they only apply
      the step-down logic to themselves, and register follower activity
      regardless of the followers' settings.
    
    * Voters with mixed `CheckQuorum` settings if fine: the leader recency
      criterion is only applied to their own vote, so either they'll
      enforce it or not.
    
    * Campaigning on leader removal is fine-ish: before 23.2 finalization,
      the first range replica will campaign -- if this replica is 23.2 it will
      bypass pre-vote and call an immediate election, if it is 23.1 then it
      will use pre-vote. However, upon receiving the 23.1 pre-vote request,
      23.2 nodes will check if the leader is still in the descriptor, and if
      it isn't they will forget it and grant the pre-vote. A quorum will
      likely apply the leader removal before receiving pre-vote requests.
      Otherwise, we will recover after an election timeout.
    
    * Campaigning after unquiescing is fine: the logic remains unchanged,
      and 23.2 nodes will forget the leader and grant prevotes if they
      find the leader dead according to liveness.
    
    * Campaigning during lease acquisitions is fine: this is needed to
      steal leadership away from an active leader that can't itself acquire
      an epoch lease because it's failing liveness heartbeats. If a 23.2 node
      also finds the leader dead in liveness, it will forget it and grant
      the prevote.
    
    Epic: none
    Release note (bug fix): The Raft PreVote and CheckQuorum mechanisms are
    now fully enabled. These prevent spurious elections when followers
    already have an active leader, and cause leaders to step down if they
    don't hear back from a quorum of followers. This improves reliability
    under partial and asymmetric network partitions, by avoiding spurious
    elections and preventing unavailability where a partially partitioned
    node could steal leadership away from an established leaseholder who
    would then no longer be able to reach the leader and submit writes.
    erikgrinaker committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    de9b2b2 View commit details
    Browse the repository at this point in the history