Skip to content

Commit

Permalink
rditer,kvserver: simplify and unify replica span handling
Browse files Browse the repository at this point in the history
A Replica comprises a number of spans:

- RangeID-keyed replicated data, such as the `GCThresholdKey`,
  confusingly named "local RangeID keys" since they all have a `/Local/`
(`\x02`) prefix. They have a `r` infix.
- RangeID-keyed unreplicated data, such as the `HardState`,
- The system-internal key-addressable "parallel plane" (such as the
  `RangeDescriptorKey`), confusingly named "local keys" (again, `\x02`
prefix, but `u` infix).
- Two lock spans: one for the "local" keyspace (e.g. for an intent on
  a `RangeDescriptor`), one for the "global" keyspace (e.g. regular
  row intents), in both cases the span is local and uses a `k` infix.
- The actual span of the range (i.e. user data).

Code that needs some or all of these spans gets them from various
methods provided by the `rditer` package. Other than being generally a
messy affair, with a separate raft log, RangeID-based unreplicated keys
will mostly[^1] live on the logstore engine, while RangeID-based
replicated keys will be on the state machine engine and so the current
approach of targeting either only the replicated *or* the combined
(replicated and unreplicated) part of the local RangeID-based span
no longer works. It was also never particularly obvious which spans
one would select with that bool, and I wound up confused more than
once.

The above means we need to stop thinking about RangeID-based keys as
belonging to "one span" - the two RangeID-keyed spans are very different
in nature and storage.

This commit introduces a unified way to select the different parts of the
keyspace. It then goes ahead and uses this new primitive for all span
uses external to the `rditer` package. (`rditer` itself has a few methods
that need exotic combinations of spans, and it gets to do those more
ad-hoc for now but this could be improved later).

In particular, we can now rework `clearRangeData` and give it an
analogoous way to explicitly select which spans to clear. Replica
removal is a central work item in CRDB-220, since it comes up all
over the place: snapshots (in particular in combination with replica
subsumption), splits, replication changes, replicaGC.

After this PR, we can improve splits by not mutating the raft state of
the right-hand side replica in cases where it is known to be newer than
the split. The old API forced us to remove the raft state as well, and
we had to awkwardly lock the right-hand side and write the raft state
back to make things work. We can now simply opt out of mutating that
raft state (via `ClearUnreplicatedByRangeID=false`). This is left for
a separate PR.

This PR makes some additional improvements:

- improve `rditer` tests via `echotest`, also test lock table keys
- improved commentary, esp. around [^1]

[^1]: #95055

Release note: None
  • Loading branch information
tbg committed Jan 17, 2023
1 parent afc3066 commit 40573d3
Show file tree
Hide file tree
Showing 25 changed files with 930 additions and 434 deletions.
40 changes: 29 additions & 11 deletions pkg/kv/kvserver/client_merge_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3838,21 +3838,38 @@ func TestStoreRangeMergeRaftSnapshot(t *testing.T) {
expectedSSTs = expectedSSTs[len(expectedSSTs)-1:]

// Construct SSTs for the range-id local keys of the subsumed replicas.
// with RangeIDs 3 and 4.
// with RangeIDs 3 and 4. Note that this also targets the unreplicated
// rangeID-based keys because we're effectively replicaGC'ing these
// replicas (while absorbing their user keys into the LHS).
for _, k := range []roachpb.Key{keyB, keyC} {
rangeID := rangeIds[string(k)]
sstFile := &storage.MemFile{}
sst := storage.MakeIngestionSSTWriter(ctx, st, sstFile)
defer sst.Close()
s := rditer.MakeRangeIDLocalKeySpan(rangeID, false /* replicatedOnly */)
// The snapshot code will use ClearRangeWithHeuristic with a threshold of
// 1 to clear the range, but this will truncate the range tombstone to the
// first key. In this case, the first key is RangeGCThresholdKey, which
// doesn't yet exist in the engine, so we write the Pebble range tombstone
// manually.
if err := sst.ClearRawRange(keys.RangeGCThresholdKey(rangeID), s.EndKey, true, false); err != nil {
return err
{
// The snapshot code will use ClearRangeWithHeuristic with a threshold of
// 1 to clear the range, but this will truncate the range tombstone to the
// first key. In this case, the first key is RangeGCThresholdKey, which
// doesn't yet exist in the engine, so we write the Pebble range tombstone
// manually.
sl := rditer.Select(rangeID, rditer.SelectOpts{
ReplicatedByRangeID: true,
})
require.Len(t, sl, 1)
s := sl[0]
require.NoError(t, sst.ClearRawRange(keys.RangeGCThresholdKey(rangeID), s.EndKey, true, false))
}
{
// Ditto for the unreplicated version, where the first key happens to be
// the HardState.
sl := rditer.Select(rangeID, rditer.SelectOpts{
UnreplicatedByRangeID: true,
})
require.Len(t, sl, 1)
s := sl[0]
require.NoError(t, sst.ClearRawRange(keys.RaftHardStateKey(rangeID), s.EndKey, true, false))
}

tombstoneKey := keys.RangeTombstoneKey(rangeID)
tombstoneValue := &roachpb.RangeTombstone{NextReplicaID: math.MaxInt32}
if err := storage.MVCCBlindPutProto(
Expand All @@ -3875,8 +3892,9 @@ func TestStoreRangeMergeRaftSnapshot(t *testing.T) {
StartKey: roachpb.RKey(keyD),
EndKey: roachpb.RKey(keyEnd),
}
s := rditer.MakeUserKeySpan(&desc)
if err := storage.ClearRangeWithHeuristic(receivingEng, &sst, s.Key, s.EndKey, 64, 8); err != nil {
if err := storage.ClearRangeWithHeuristic(
receivingEng, &sst, desc.StartKey.AsRawKey(), desc.EndKey.AsRawKey(), 64, 8,
); err != nil {
return err
}
if err = sst.Finish(); err != nil {
Expand Down
54 changes: 35 additions & 19 deletions pkg/kv/kvserver/consistency_queue_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"time"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/keys"
"github.com/cockroachdb/cockroach/pkg/kv"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverbase"
Expand Down Expand Up @@ -93,15 +94,16 @@ func TestConsistencyQueueRequiresLive(t *testing.T) {
}

// TestCheckConsistencyMultiStore creates a node with three stores
// with three way replication. A value is added to the node, and a
// consistency check is run.
// with three way replication. A consistency check is run on r1,
// which always has some data, and on a newly split off range
// as well.
func TestCheckConsistencyMultiStore(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

tc := testcluster.StartTestCluster(t, 3,
base.TestClusterArgs{
ReplicationMode: base.ReplicationAuto,
ReplicationMode: base.ReplicationManual,
ServerArgs: base.TestServerArgs{
Knobs: base.TestingKnobs{
Store: &kvserver.StoreTestingKnobs{
Expand All @@ -115,24 +117,38 @@ func TestCheckConsistencyMultiStore(t *testing.T) {
defer tc.Stopper().Stop(context.Background())
store := tc.GetFirstStoreFromServer(t, 0)

// Write something to the DB.
putArgs := putArgs([]byte("a"), []byte("b"))
if _, err := kv.SendWrapped(context.Background(), store.DB().NonTransactionalSender(), putArgs); err != nil {
t.Fatal(err)
// Split off a range and write to it; we'll use it below but we test r1 first.
k2 := tc.ScratchRange(t)
{
// Write something to k2.
putArgs := putArgs(k2, []byte("b"))
_, pErr := kv.SendWrapped(context.Background(), store.DB().NonTransactionalSender(), putArgs)
require.NoError(t, pErr.GoError())
}

// Run consistency check.
checkArgs := roachpb.CheckConsistencyRequest{
RequestHeader: roachpb.RequestHeader{
// span of keys that include "a".
Key: []byte("a"),
EndKey: []byte("aa"),
},
}
if _, err := kv.SendWrappedWith(context.Background(), store.DB().NonTransactionalSender(), roachpb.Header{
Timestamp: store.Clock().Now(),
}, &checkArgs); err != nil {
t.Fatal(err)
// 3x replicate our ranges.
//
// Depending on what we're doing, we need to use LocalMax or [R]KeyMin
// for r1 due to the issue below.
//
// See: https://github.com/cockroachdb/cockroach/issues/95055
tc.AddVotersOrFatal(t, roachpb.KeyMin, tc.Targets(1, 2)...)
tc.AddVotersOrFatal(t, k2, tc.Targets(1, 2)...)

// Run consistency check on r1 and ScratchRange.
for _, k := range []roachpb.Key{keys.LocalMax, k2} {
t.Run(k.String(), func(t *testing.T) {
checkArgs := roachpb.CheckConsistencyRequest{
RequestHeader: roachpb.RequestHeader{
Key: keys.LocalMax,
EndKey: keys.LocalMax.Next(),
},
}
_, pErr := kv.SendWrappedWith(context.Background(), store.DB().NonTransactionalSender(), roachpb.Header{
Timestamp: store.Clock().Now(),
}, &checkArgs)
require.NoError(t, pErr.GoError())
})
}
}

Expand Down
12 changes: 11 additions & 1 deletion pkg/kv/kvserver/rditer/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ go_library(
name = "rditer",
srcs = [
"replica_data_iter.go",
"select.go",
"stats.go",
],
importpath = "github.com/cockroachdb/cockroach/pkg/kv/kvserver/rditer",
Expand All @@ -21,21 +22,30 @@ go_library(
go_test(
name = "rditer_test",
size = "small",
srcs = ["replica_data_iter_test.go"],
srcs = [
"replica_data_iter_test.go",
"select_test.go",
],
args = ["-test.timeout=55s"],
data = glob(["testdata/**"]),
embed = [":rditer"],
deps = [
"//pkg/keys",
"//pkg/kv/kvserver/concurrency/lock",
"//pkg/kv/kvserver/spanset",
"//pkg/roachpb",
"//pkg/settings/cluster",
"//pkg/storage",
"//pkg/testutils",
"//pkg/testutils/datapathutils",
"//pkg/testutils/echotest",
"//pkg/testutils/skip",
"//pkg/util/hlc",
"//pkg/util/leaktest",
"//pkg/util/randutil",
"//pkg/util/uuid",
"@com_github_olekukonko_tablewriter//:tablewriter",
"@com_github_stretchr_testify//assert",
"@com_github_stretchr_testify//require",
],
)
Expand Down
Loading

0 comments on commit 40573d3

Please sign in to comment.