Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: clear RHS state machine when moving past a split using a snapshot #73462

Open
2 tasks
sumeerbhola opened this issue Dec 4, 2021 · 0 comments
Open
2 tasks
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Dec 4, 2021

(discussion in https://cockroachlabs.slack.com/archives/C02KHQMF2US/p1638412757069600)

Consider a range split where this node is lagging and has not yet applied the split, and then receives a post-split snapshot for the LHS.

  • The RHS got rebalanced away from the node. We did not delete any state machine state since the split has not happened yet. But we did write a RangeTombstone (to be used in splitPreApply later in the rightRepl == nil code path here
    if rightRepl == nil || rightRepl.isNewerThanSplit(&split) {
    // We're in the rare case where we know that the RHS has been removed
    // and re-added with a higher replica ID (and then maybe removed again).
    //
    ).
  • But this node never executes the split (so splitPreApply does not execute) and instead receives and applies a post-split snapshot for this range in order to catchup. This post-split snapshot only contains the LHS state.
    • The multiSSTWriter used to clear the existing range state uses the RangeDescriptor in the snapshot, which is only the LHS
      // At the moment we'll write at most five SSTs.
      // TODO(jeffreyxiao): Re-evaluate as the default range size grows.
      keyRanges := rditer.MakeReplicatedKeyRanges(header.State.Desc)
      msstw, err := newMultiSSTWriter(ctx, kvSS.scratch, keyRanges, kvSS.sstChunkSize)
    • So the RHS range local and range global keys would still exist after applying the snapshot (I am assuming the RHS is not considered a subsumed replica in applySnapshot since the span of the snapshot, which is the LHS, does not subsume the RHS).

Result: we have leaked state in the engine.

Steps:

  • Verify that there is a bug with a unit test.
  • Use the wider of the range spans of the snapshot RangeDescriptor and the existing RangeDescriptor of the range (if any), in deciding what state to clear. This is safe to do even if the RHS has not been rebalanced away since the RHS must be uninitialized.

cc: @tbg

Jira issue: CRDB-11599
Epic CRDB-39898

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Dec 4, 2021
@exalate-issue-sync exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

3 participants