kvserver: clear RHS state machine when moving past a split using a snapshot #73462
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
(discussion in https://cockroachlabs.slack.com/archives/C02KHQMF2US/p1638412757069600)
Consider a range split where this node is lagging and has not yet applied the split, and then receives a post-split snapshot for the LHS.
splitPreApply
later in therightRepl == nil
code path herecockroach/pkg/kv/kvserver/store_split.go
Lines 61 to 64 in 0ed2562
splitPreApply
does not execute) and instead receives and applies a post-split snapshot for this range in order to catchup. This post-split snapshot only contains the LHS state.multiSSTWriter
used to clear the existing range state uses the RangeDescriptor in the snapshot, which is only the LHScockroach/pkg/kv/kvserver/store_snapshot.go
Lines 238 to 241 in 7c88d5b
applySnapshot
since the span of the snapshot, which is the LHS, does not subsume the RHS).Result: we have leaked state in the engine.
Steps:
cc: @tbg
Jira issue: CRDB-11599
Epic CRDB-39898
The text was updated successfully, but these errors were encountered: