-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: use AdminSplit in splitTestRange #72383
Conversation
72392: kvserver: simplify testContext r=erikgrinaker a=tbg See individual commits for details. I just spent some time cleaning up `createTestStore` (#72383) and I noticed that `testContext` is much more heavily used and has accumulated quite a bit of detritus. I hope to ultimately make it use `createTestStore` but that will be for another day. ---- - kvserver: remove two uses of bootstrapRangeOnly - kvserver: remove TestReplicaLaziness - kvserver: remove testContext.bootstrapMode - kvserver: drop useless on-disk engine in a test - kvserver: simplify testContext - kvserver: fix up TestRaftSSTableSideloading - kvserver: prevent pre-populating `testContext.eng` Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
5b2ec2d
to
9c33094
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow sorry for posting such a junk PR! It's actually not all junk but I submitted a bunch of stuff that came from my initial attempts at getting splits to work. Removed these extra bits.
bors r=erikgrinaker
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvserver/replica_command.go, line 333 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Why this extra call?
Detritus. At one point I was getting mismatch errors here and I wanted to know why. Since rr
doesn't work on my platform the easiest way to be able to step into ContainsKey
with the right args is to duplicate it here. Removed.
Build failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was wondering about the DistSender emulation in the mock senders and stuff, but figured you'd added it temporarily for a reason. Nice to see it gone. :)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
bors r- Apparently it was needed for something? Urgh, going to take a look. |
Oh ok that makes sense. I fixed up the tests that use |
9c33094
to
5f9ea7c
Compare
Pretty satisfying - got wrapped up with this just as I'm brushing up against the end of my day. I think this is fit for review though some small thing will certainly be wrong with it. |
`splitTestRange` was previously reaching into the store to finagle a split. It is used by around a dozen tests. Prototyping around cockroachdb#72374 has shown that these tests frequently need patching up whenever we adjust (improve) the store's replica handling. This is a time suck and besides, we also want to be able to test the Store from within the `kvserver` (not `kvserver_test`) package. So if we can make that happen, and can use AdminSplit, that would be preferrable. AdminSplit requires them to run a (somewhat) distributed multi-range transaction. A first split would hit a single range, but after that the split batch hits at least two ranges (meta2 and splitKey), and so we need nontrivial DistSender-like functionality. Splits are also nontrivial distributed transactions and so we need a TxnCoordSender. Experience suggests that it's better to use the "real thing" and to make sure it's configurable enough to fit the use case, rather than whipping up half-baked replacements. Luckily, it turned out that DistSender and TxnCoordSender are already up to the task, and this commit adopts them in `createTestStoreWithoutStart`, and changes `splitTestRange` to use `AdminSplit`. Release note: None
Release note: None
…ious commit] This was a very satisfying cleanup. The commit history won't pass CI on the individual commits, so this commit and all before it need to be flattened. I'll do this once approved to make reviewing easier as the changes are mostly non-overlapping (but they overlap through `splitTestRange`). Release note: None
The test use key prefixes that collide with kvs from cluster bootstrap if the replica were properly bootstrapped. This will be the case in a few commits, so give this test its own prefix to operate under. Release note: None
These tests work today because the store they're operating under is strung together and is missing lots of moving parts. Once we use a more fully initialized stores these tests are prime candidates for being flaky. The real solution would be to to add testing knobs so that we could investigate, in the test, the reasons for gossip, and to make sure they are all valid reasons. But we are unlikely to break anything in this area and are going to remove gossip of the system config altogether in cockroachdb#70560. So just remove these tests, which prevents them from getting in the way of refactors of the test harness. Release note: None
The old code would fail on a properly initialized store where there are dozens of values in the initial system config. Release note: None
Once the store is properly initialized, the system config will have additional entries. Release note: None
Separated intents are fully baked in now, so this randomization makes less sense. We continue to randomize in some tests, such as TestMVCCHistories, so some coverage is retained. The randomization caused nondeterminism during refactors of testContext (where it was then affecting tests that were written with separated intents hard-coded to "on") and so there was an immediate reason to remove it. Release note: None
This test was duplicating parts of `createTestStoreWithoutStart` and it was the one bogus hold-out user of `testSenderFactory`, which we now get to remove. Release note: None
5f9ea7c
to
139cf34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at a glance. CI seems green too.
tftr! bors r=erikgrinaker |
Build failed: |
I will stress kvserver before trying again. Pretty sure I've introduced a
few flakes.
(sent from mobile, please excuse my brevity)
…On Sat, Nov 6, 2021, 15:37 craig[bot] ***@***.***> wrote:
Build failed:
- GitHub CI (Cockroach)
<https://teamcity.cockroachdb.com/viewLog.html?buildId=3692359&buildTypeId=Cockroach_UnitTests>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#72383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGXPZEX6BOA4O2DOSW47CTUKU4UDANCNFSM5HI55PEQ>
.
|
It wasn't true that the first gossiped config we'd see would be the one triggered by the lease. Interestingly, I found that this test was flaky on master as well but it isn't [detected] during nightly stress: ``` === RUN TestReplicaGossipConfigsOnLease test_log_scope.go:79: test logs captured to: /tmp/logTestReplicaGossipConfigsOnLease261191452 test_log_scope.go:80: use -show-logs to present logs inline replica_test.go:1204: unexpected gossip of system config: values:<key:"\222" value:<raw_bytes:"\000\000\000\000\001T" timestamp:<> > > replica_test.go:1253: -- test log scope end -- test logs left over in: /tmp/logTestReplicaGossipConfigsOnLease261191452 --- FAIL: TestReplicaGossipConfigsOnLease (0.03s) ``` I checked the last [nightly] and the package did get stressed, with 20 iterations passing. Somewhat odd that I can reproduce this with a targeted stress pretty much instantly, and it's not showing up for years in CI. [detected]: https://github.com/cockroachdb/cockroach/issues?q=is%3Aissue+TestReplicaGossipConfigsOnLease+is%3Aclosed [nightly]: https://teamcity.cockroachdb.com/viewLog.html?buildId=3693551&buildTypeId=Cockroach_Nightlies_Stress&tab=buildLog&_focus=383#_state=226 Release note: None
I went through 200 iters of the kvserver package (thx roachprod-stress) without issues. The one flake I saw during CI is now fixed; it was also flaky on master before. bors r=erikgrinaker |
bors r- ha, overlooked the "21 failures" line of my nonstandard invocation of |
Canceled. |
bors r=erikgrinaker |
Build succeeded: |
Some tests do add peers and so we'll see some use of the raft transport. This doesn't have to work properly, just not crash, which this achieves. This is essentially a re-do of cockroachdb#69730 which was lost in cockroachdb#72383. Release note: None
splitTestRange
was previously reaching into the store to finagle asplit. It is used by around a dozen tests. Prototyping around #72374
has shown that these tests frequently need patching up whenever we
adjust (improve) the store's replica handling.
This is a time suck and besides, we also want to be able to test
the Store from within the
kvserver
(notkvserver_test
) package.So if we can make that happen, and can use AdminSplit, that would
be preferrable.
AdminSplit requires them to run a (somewhat) distributed multi-range
transaction. A first split would hit a single range, but after that the
split batch hits at least two ranges (meta2 and splitKey), and so we
need nontrivial DistSender-like functionality. Splits are also
nontrivial distributed transactions and so we need a TxnCoordSender.
Experience suggests that it's better to use the "real thing" and to
make sure it's configurable enough to fit the use case, rather than
whipping up half-baked replacements.
Luckily, it turned out that DistSender and TxnCoordSender are already
up to the task, and this commit adopts them in
createTestStoreWithoutStart
, and changessplitTestRange
to useAdminSplit
.Release note: None