Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rditer,kvserver: simplify and unify replica span handling #94201

Merged
merged 2 commits into from
Jan 17, 2023

Conversation

tbg
Copy link
Member

@tbg tbg commented Dec 23, 2022

From 40573d3 Mon Sep 17 00:00:00 2001
From: Tobias Grieger tobias.b.grieger@gmail.com
Date: Fri, 6 Jan 2023 05:42:50 +0100
Subject: [PATCH] rditer,kvserver: simplify and unify replica span handling

A Replica comprises a number of spans:

  • RangeID-keyed replicated data, such as the GCThresholdKey,
    confusingly named "local RangeID keys" since they all have a /Local/
    (\x02) prefix. They have a r infix.
  • RangeID-keyed unreplicated data, such as the HardState,
  • The system-internal key-addressable "parallel plane" (such as the
    RangeDescriptorKey), confusingly named "local keys" (again, \x02
    prefix, but u infix).
  • Two lock spans: one for the "local" keyspace (e.g. for an intent on
    a RangeDescriptor), one for the "global" keyspace (e.g. regular
    row intents), in both cases the span is local and uses a k infix.
  • The actual span of the range (i.e. user data).

Code that needs some or all of these spans gets them from various
methods provided by the rditer package. Other than being generally a
messy affair, with a separate raft log, RangeID-based unreplicated keys
will mostly1 live on the logstore engine, while RangeID-based
replicated keys will be on the state machine engine and so the current
approach of targeting either only the replicated or the combined
(replicated and unreplicated) part of the local RangeID-based span
no longer works. It was also never particularly obvious which spans
one would select with that bool, and I wound up confused more than
once.

The above means we need to stop thinking about RangeID-based keys as
belonging to "one span" - the two RangeID-keyed spans are very different
in nature and storage.

This commit introduces a unified way to select the different parts of the
keyspace. It then goes ahead and uses this new primitive for all span
uses external to the rditer package. (rditer itself has a few methods
that need exotic combinations of spans, and it gets to do those more
ad-hoc for now but this could be improved later).

In particular, we can now rework clearRangeData and give it an
analogoous way to explicitly select which spans to clear. Replica
removal is a central work item in CRDB-220, since it comes up all
over the place: snapshots (in particular in combination with replica
subsumption), splits, replication changes, replicaGC.

After this PR, we can improve splits by not mutating the raft state of
the right-hand side replica in cases where it is known to be newer than
the split. The old API forced us to remove the raft state as well, and
we had to awkwardly lock the right-hand side and write the raft state
back to make things work. We can now simply opt out of mutating that
raft state (via ClearUnreplicatedByRangeID=false). This is left for
a separate PR.

This PR makes some additional improvements:

  • improve rditer tests via echotest, also test lock table keys
  • improved commentary, esp. around 1.

Release note: None
Epic: CRDB-220

Footnotes

  1. https://github.com/cockroachdb/cockroach/issues/95055 2

@cockroach-teamcity
Copy link
Member

This change is Reviewable

tbg added a commit to tbg/cockroach that referenced this pull request Jan 6, 2023
I was working on a change over in cockroachdb#94201 which caused this test
to fail. The output it was producing was not helpful; if the SSTs
don't match exactly this isn't actionable information unless these
same SSTs are also available for inspection. They now are.

Epic: CRDB-220
Release note: None
@tbg tbg force-pushed the clear-range-data branch 4 times, most recently from fa9422c to 12fce8d Compare January 6, 2023 09:53
@tbg tbg changed the title kvserver: separate replicated and unreplicated ID keys in clearRangeData rditer,kvserver: simplify and unify replica span handling Jan 6, 2023
craig bot pushed a commit that referenced this pull request Jan 6, 2023
94812: kvserver: make TestStoreRangeMergeRaftSnapshot more debuggable r=erikgrinaker a=tbg

I was working on a change over in #94201 which caused this test
to fail. The output it was producing was not helpful; if the SSTs
don't match exactly this isn't actionable information unless these
same SSTs are also available for inspection. They now are.

Epic: CRDB-220
Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@tbg tbg force-pushed the clear-range-data branch 2 times, most recently from d0b5eb8 to 1b6b76a Compare January 9, 2023 11:30
@tbg
Copy link
Member Author

tbg commented Jan 9, 2023

I suggest starting the review with two windows, one looking at old diff for makeRangeKeySpans and comparing it to Select in new file select.go. The logic essentially transitioned over.

The remainder of the changes in rditer remove custom implementations and use Select instead:

  • MakeReplicatedKeySpansExceptRangeID was removed, since we could just use Select directly.
  • MakeRangeIDLocalKeySpan is no longer there because we now have two methods to make spans for the replicated and unreplicated parts, respectively: makeRangeID{Unre,Re}plicatedSpan.

In kvserver, the main change affects clearRangeData, which wants to use the new API:

  • clearRangeDataOptions is introduced and basically mirrors the options to Select
  • all callers are updated to use it.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumping some comments for today.

pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_raftstorage.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/testdata/TestSelect/span Outdated Show resolved Hide resolved
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented the fixups. Looking good. I'll make a final pass on the whole PR after this little round later today.

pkg/kv/kvserver/rditer/replica_data_iter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select_test.go Outdated Show resolved Hide resolved
@tbg tbg requested a review from pav-kv January 11, 2023 14:45
pkg/kv/kvserver/rditer/select_test.go Outdated Show resolved Hide resolved
@tbg tbg requested a review from pav-kv January 12, 2023 10:20
@tbg
Copy link
Member Author

tbg commented Jan 12, 2023

Could I trouble you for a last review? I just fixed a pretty severe (but at least caught in 100 places across CI) problem, see the new commits.

@tbg
Copy link
Member Author

tbg commented Jan 13, 2023

Friendly ping.

pkg/kv/kvserver/replica_raftstorage.go Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rditer/select.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replicate_queue_test.go Show resolved Hide resolved
@tbg tbg force-pushed the clear-range-data branch 3 times, most recently from 3a004d5 to c1a0a1f Compare January 16, 2023 17:03
A Replica comprises a number of spans:

- RangeID-keyed replicated data, such as the `GCThresholdKey`,
  confusingly named "local RangeID keys" since they all have a `/Local/`
(`\x02`) prefix. They have a `r` infix.
- RangeID-keyed unreplicated data, such as the `HardState`,
- The system-internal key-addressable "parallel plane" (such as the
  `RangeDescriptorKey`), confusingly named "local keys" (again, `\x02`
prefix, but `u` infix).
- Two lock spans: one for the "local" keyspace (e.g. for an intent on
  a `RangeDescriptor`), one for the "global" keyspace (e.g. regular
  row intents), in both cases the span is local and uses a `k` infix.
- The actual span of the range (i.e. user data).

Code that needs some or all of these spans gets them from various
methods provided by the `rditer` package. Other than being generally a
messy affair, with a separate raft log, RangeID-based unreplicated keys
will mostly[^1] live on the logstore engine, while RangeID-based
replicated keys will be on the state machine engine and so the current
approach of targeting either only the replicated *or* the combined
(replicated and unreplicated) part of the local RangeID-based span
no longer works. It was also never particularly obvious which spans
one would select with that bool, and I wound up confused more than
once.

The above means we need to stop thinking about RangeID-based keys as
belonging to "one span" - the two RangeID-keyed spans are very different
in nature and storage.

This commit introduces a unified way to select the different parts of the
keyspace. It then goes ahead and uses this new primitive for all span
uses external to the `rditer` package. (`rditer` itself has a few methods
that need exotic combinations of spans, and it gets to do those more
ad-hoc for now but this could be improved later).

In particular, we can now rework `clearRangeData` and give it an
analogoous way to explicitly select which spans to clear. Replica
removal is a central work item in CRDB-220, since it comes up all
over the place: snapshots (in particular in combination with replica
subsumption), splits, replication changes, replicaGC.

After this PR, we can improve splits by not mutating the raft state of
the right-hand side replica in cases where it is known to be newer than
the split. The old API forced us to remove the raft state as well, and
we had to awkwardly lock the right-hand side and write the raft state
back to make things work. We can now simply opt out of mutating that
raft state (via `ClearUnreplicatedByRangeID=false`). This is left for
a separate PR.

This PR makes some additional improvements:

- improve `rditer` tests via `echotest`, also test lock table keys
- improved commentary, esp. around [^1]

[^1]: cockroachdb#95055

Release note: None
@tbg
Copy link
Member Author

tbg commented Jan 17, 2023

TFTR! I skipped the few nits you had to get this PR merged, but I did make the non-nit adjustment.
bors r=pavelkalinnikov

@craig
Copy link
Contributor

craig bot commented Jan 17, 2023

Build failed:

@tbg
Copy link
Member Author

tbg commented Jan 17, 2023

   	Error:      	"change replicas of r54 failed: descriptor changed: [expected] r54:/T{able/53-enant/10} [(n1,s1):1, (n2,s2):2, (n5,s5):3, (n3,s3):5NON_VOTER, (n4,s4):6NON_VOTER, next=7, gen=9] != [actual] r54:/T{able/53-enant/10} [(n1,s1):1, (n2,s2):2, (n5,s5):3NON_VOTER, (n3,s3):5, next=7, gen=12]" does not contain "ok"

#95262

bors r=pavelkalinnikov

@craig
Copy link
Contributor

craig bot commented Jan 17, 2023

Build succeeded:

@craig craig bot merged commit fa43210 into cockroachdb:master Jan 17, 2023
@tbg tbg deleted the clear-range-data branch January 17, 2023 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants