-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: introduce RangeAppliedState.RaftAppliedIndexTerm #75675
Conversation
This lacks tests (and will probably fail some tests).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good already, mostly just needs some testing.
Reviewed 15 of 15 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
pkg/kv/kvserver/replica_application_state_machine.go, line 853 at r1 (raw file):
ctx context.Context, cmd *replicatedCmd, ) { // TODO(sumeer): when will raftAppliedIndex be 0?
I don't think it ever can (even the raft-proposed entries on leadership change are proper entries, i.e. have a proper index). Does anything fail if you make this unconditional? git blame
shows @nvanbenschoten as last touching this, perhaps you remember Nathan?
pkg/kv/kvserver/replica_application_state_machine.go, line 859 at r1 (raw file):
// We are post migration or this replicatedCmd is doing the migration. if b.state.RaftAppliedIndexTerm > 0 || (rs != nil && rs.RaftAppliedIndexTerm == stateloader.RaftLogTermSignalForAddRaftAppliedIndexTermMigration) {
👍
pkg/kv/kvserver/replica_application_state_machine.go, line 931 at r1 (raw file):
r.mu.Lock() r.mu.state.RaftAppliedIndex = b.state.RaftAppliedIndex // This will be non-zero only when the migration has happened.
Name the migration so we adjust this comment when we remove it. Or just remove the comment.
pkg/kv/kvserver/replica_raftstorage.go, line 389 at r1 (raw file):
// does not contain any of the replica data. The snapshot is actually generated // (and sent) by the Raft snapshot queue. // TODO(sumeer): is this method really used?
Yes. It has one caller in our code which is a test (TestSyncSnapshot
) which hardly counts, but raft
will call this method, see (*raftLog).snapshot
. However, the way we do snapshots is sort of different from how raft
wants them to happen. raft
expects that it generates the snapshot (by calling Snapshot
) and that "sending" the result actually sends the snapshot.
In CRDB, that message is really just intercepted (at the sender) and instead we add the replica (the raft leader) to the raft snapshot queue, and when its turn comes we'll look at the raft state for followers that want a snapshot, and then send one, but this won't call (*replicaRaftStorage).Snapshot
:
Since you're here, consider distilling this into a comment.
pkg/kv/kvserver/replica_raftstorage.go, line 431 at r1 (raw file):
// Or do we not care whether these two values are consistent given that // appliedIndex is just serving as an upper bound on what can be truncated // in the log, so it's completely best-effort?
It's not terrible to go a little wrong here, but this is my first time (after a long time, when I wrote it) of looking at this code and yeah, this isn't good. We should load the applied index from the snapshot for cleanliness. If nothing else, at least appliedIndex
should be pushed down to a smaller scope so that there's no danger of someone using it later.
Let me know if you'd prefer not cleaning this up in this PR, then I will send a separate one.
pkg/kv/kvserver/replica_raftstorage.go, line 585 at r1 (raw file):
if state.RaftAppliedIndexTerm != 0 && term != state.RaftAppliedIndexTerm { return OutgoingSnapshot{}, errors.Errorf("unequal terms %d != %d", term, state.RaftAppliedIndexTerm)
errors.AssertionFailedf
? After all, this should never happen.
pkg/kv/kvserver/replica_raftstorage.go, line 1000 at r1 (raw file):
// raftpb.SnapshotMetadata. r.mu.lastIndex = state.RaftAppliedIndex // TODO(sumeer): why can't this be set to nonemptySnap.Metadata.Term?
I think that would be correct, too. We could also populate it in the only other use of invalidLastTerm
:
cockroach/pkg/kv/kvserver/replica_init.go
Lines 193 to 197 in 442e5b9
r.mu.lastIndex, err = r.mu.stateLoader.LoadLastIndex(ctx, r.Engine()) | |
if err != nil { | |
return err | |
} | |
r.mu.lastTerm = invalidLastTerm |
I think the idea must have been that it is "safest" to have only one place that populates this field with the real term, i.e. this one:
cockroach/pkg/kv/kvserver/replica_raft.go
Line 843 in 852a80c
r.mu.lastTerm = lastTerm |
That code above actually looks a little brittle. We make a local lastTerm
cockroach/pkg/kv/kvserver/replica_raft.go
Line 537 in 852a80c
lastTerm := r.mu.lastTerm |
r.mu.lastTerm
there is this bit:
cockroach/pkg/kv/kvserver/replica_raft.go
Lines 634 to 641 in 852a80c
// r.mu.lastIndex, r.mu.lastTerm and r.mu.raftLogSize were updated in | |
// applySnapshot, but we also want to make sure we reflect these changes in | |
// the local variables we're tracking here. | |
r.mu.RLock() | |
lastIndex = r.mu.lastIndex | |
lastTerm = r.mu.lastTerm | |
raftLogSize = r.mu.raftLogSize | |
r.mu.RUnlock() |
I don't think now is the time to make this better, but it's the kind of thing that would break, perhaps in a subtle way, when touched.
For now, perhaps leave a comment saying that setting it to the actual term here would also work, but that this is fine as is, too.
pkg/kv/kvserver/store_snapshot.go, line 907 at r1 (raw file):
} // TODO: is SendEmptySnapshot only used for a new cluster, so
It's only used for (triggered by) the cockroach debug reset-quorum
tool. So this seems correct.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 1014 at r1 (raw file):
return enginepb.MVCCStats{}, result.Result{}, errors.Wrap(err, "unable to load replica version") } // The RHS should populate RaftAppliedIndexTerm if the LHS is doing so.
The other option is to do it when it's possible, i.e. gated behind the cluster setting. That way, there will be less to migrate later. Both are ok.
pkg/kv/kvserver/batcheval/cmd_migrate.go, line 103 at r1 (raw file):
Replicated: kvserverpb.ReplicatedEvalResult{ State: &kvserverpb.ReplicaState{ // NB: This is not a real term value. We can not know the term at
This comment is sort of confusing, since you're intentionally sending a term here that can never be used as a real term (we have either 0 or >=10 in CRDB due to raftInitialLogTerm, as you know). How about
// Signal the migration by sending a term on the new field that we want to migrate into. This
// term is chosen as one that would never be used in practice (since raftInitialLogTerm is 10),
// so we can special-case it below raft and start writing the (real) term to the AppliedState.
pkg/kv/kvserver/stateloader/initial.go, line 118 at r1 (raw file):
initialMS := enginepb.MVCCStats{} // TODO: is WriteInitialRangeState only used for a new cluster, so
In prod this code will only be used for >= v22.1
nodes (which will bootstrap at the v22.1 cluster setting, i.e. with the applied index term active).
In testing, this may not be true as the cluster version can be specified arbitrarily, i.e. the version here may not be the "default choice":
Lines 324 to 330 in e2c8db6
if err := kvserver.WriteInitialClusterData( | |
ctx, eng, initialValues, | |
bootstrapVersion.Version, len(engines), splits, | |
hlc.UnixNano(), storeKnobs, | |
); err != nil { | |
return nil, err | |
} |
So I think you need to compare replicaVersion
against the versions you've introduced and only use true
when the first one is active.
This is probably also what you'll want to do if you want to end-to-end test the migration, perhaps similar to
cockroach/pkg/kv/kvserver/client_migration_test.go
Lines 232 to 234 in c52b01f
// TestMigrateWaitsForApplication checks to see that migrate commands wait to be | |
// applied on all replicas before returning to the caller. | |
func TestMigrateWaitsForApplication(t *testing.T) { |
(make a cluster at old version, make a scratch range, check that it doesn't track the term, invoke Migrate, check that it now tracks the term).
pkg/migration/migrations/raft_applied_index_term.go, line 87 at r1 (raw file):
) error { // TODO(sumeer): this is copied from postTruncatedStateMigration. In // comparison, postSeparatedIntentsMigration iterated over ranges and issues
The code here removes replicas that are waiting for replicaGC. This wouldn't be achieved by anything that "talks to ranges", as by definition a replicaGC'able replica will not be contacted by a request sent to a range.
pkg/storage/enginepb/mvcc3.proto, line 232 at r1 (raw file):
// setting it to a value > 0. This is desirable since we don't want a mixed // version cluster to have divergent replica state simply because we have // introduced this field. An explicit migration will cause this field to
Mention the version name so that we can clean up all comments when the migration is removed.
I would recommend doing this in all comments related to a migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/migration/migrations/raft_applied_index_term.go, line 87 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The code here removes replicas that are waiting for replicaGC. This wouldn't be achieved by anything that "talks to ranges", as by definition a replicaGC'able replica will not be contacted by a request sent to a range.
The separated intents migration has the following comment
// Issue no-op Migrate commands to all ranges. This has the only
// purpose of clearing out any orphaned replicas, preventing interleaved
// intents in them from resurfacing.
How do these "orphaned replicas" compare to replicas "waiting for replicaGC"? Sounds the same to me.
And your comment that such a replica won't be contacted by a request that needs to be raft replicated makes a lot of sense. So did we make a mistake in the separated intents migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
pkg/migration/migrations/raft_applied_index_term.go, line 87 at r1 (raw file):
Previously, sumeerbhola wrote…
The separated intents migration has the following comment
// Issue no-op Migrate commands to all ranges. This has the only // purpose of clearing out any orphaned replicas, preventing interleaved // intents in them from resurfacing.
How do these "orphaned replicas" compare to replicas "waiting for replicaGC"? Sounds the same to me.
And your comment that such a replica won't be contacted by a request that needs to be raft replicated makes a lot of sense. So did we make a mistake in the separated intents migration?
That would be my reading of this. I'm not sure what the rationale was for this step. The original PR (#66445) has a lot of activity, and looking through I wasn't able to find a discussion of it either.
@irfansharif @itsbilal do you have any insights into the two different approaches taken to get rid of orphaned replicas in preceding migrations that we are discussing above. I want to make sure I have a clear understanding of this, to avoid making a mistake in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I will ping when the tests are ready.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @nvanbenschoten, @sumeerbhola, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 853 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I don't think it ever can (even the raft-proposed entries on leadership change are proper entries, i.e. have a proper index). Does anything fail if you make this unconditional?
git blame
shows @nvanbenschoten as last touching this, perhaps you remember Nathan?
Added a log.Fatal
if it is 0.
pkg/kv/kvserver/replica_application_state_machine.go, line 931 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Name the migration so we adjust this comment when we remove it. Or just remove the comment.
Added a comment
pkg/kv/kvserver/replica_raftstorage.go, line 389 at r1 (raw file):
Yes. It has one caller in our code which is a test (TestSyncSnapshot) which hardly counts, but raft will call this method, see (*raftLog).snapshot.
That's the real usage I was searching for. Thanks.
Added a comment.
pkg/kv/kvserver/replica_raftstorage.go, line 431 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It's not terrible to go a little wrong here, but this is my first time (after a long time, when I wrote it) of looking at this code and yeah, this isn't good. We should load the applied index from the snapshot for cleanliness. If nothing else, at least
appliedIndex
should be pushed down to a smaller scope so that there's no danger of someone using it later.
Let me know if you'd prefer not cleaning this up in this PR, then I will send a separate one.
I've pushed appliedIndex to a narrow scope as you suggested and added a comment with TODO.
pkg/kv/kvserver/replica_raftstorage.go, line 585 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
errors.AssertionFailedf
? After all, this should never happen.
Done
pkg/kv/kvserver/replica_raftstorage.go, line 1000 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I think that would be correct, too. We could also populate it in the only other use of
invalidLastTerm
:cockroach/pkg/kv/kvserver/replica_init.go
Lines 193 to 197 in 442e5b9
r.mu.lastIndex, err = r.mu.stateLoader.LoadLastIndex(ctx, r.Engine()) if err != nil { return err } r.mu.lastTerm = invalidLastTerm I think the idea must have been that it is "safest" to have only one place that populates this field with the real term, i.e. this one:
cockroach/pkg/kv/kvserver/replica_raft.go
Line 843 in 852a80c
r.mu.lastTerm = lastTerm That code above actually looks a little brittle. We make a local
lastTerm
and then massage it whenever something happens. For example, when we apply a snapshot (updatingcockroach/pkg/kv/kvserver/replica_raft.go
Line 537 in 852a80c
lastTerm := r.mu.lastTerm r.mu.lastTerm
there is this bit:cockroach/pkg/kv/kvserver/replica_raft.go
Lines 634 to 641 in 852a80c
// r.mu.lastIndex, r.mu.lastTerm and r.mu.raftLogSize were updated in // applySnapshot, but we also want to make sure we reflect these changes in // the local variables we're tracking here. r.mu.RLock() lastIndex = r.mu.lastIndex lastTerm = r.mu.lastTerm raftLogSize = r.mu.raftLogSize r.mu.RUnlock() I don't think now is the time to make this better, but it's the kind of thing that would break, perhaps in a subtle way, when touched.
For now, perhaps leave a comment saying that setting it to the actual term here would also work, but that this is fine as is, too.
Added comment
pkg/kv/kvserver/store_snapshot.go, line 907 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It's only used for (triggered by) the
cockroach debug reset-quorum
tool. So this seems correct.
Hardcoded this to true and added a comment.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 1014 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The other option is to do it when it's possible, i.e. gated behind the cluster setting. That way, there will be less to migrate later. Both are ok.
I've added a comment specifying this option, but left the code the same.
Do you think we need this fallback anyway -- say the unsplit range has been migrated and then moved to this node, but the view of the cluster setting at this node is lagging, so it would be bad if the RHS did not populate the term.
pkg/kv/kvserver/batcheval/cmd_migrate.go, line 103 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This comment is sort of confusing, since you're intentionally sending a term here that can never be used as a real term (we have either 0 or >=10 in CRDB due to raftInitialLogTerm, as you know). How about
// Signal the migration by sending a term on the new field that we want to migrate into. This
// term is chosen as one that would never be used in practice (since raftInitialLogTerm is 10),
// so we can special-case it below raft and start writing the (real) term to the AppliedState.
Done
pkg/kv/kvserver/stateloader/initial.go, line 118 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
In prod this code will only be used for
>= v22.1
nodes (which will bootstrap at the v22.1 cluster setting, i.e. with the applied index term active).In testing, this may not be true as the cluster version can be specified arbitrarily, i.e. the version here may not be the "default choice":
Lines 324 to 330 in e2c8db6
if err := kvserver.WriteInitialClusterData( ctx, eng, initialValues, bootstrapVersion.Version, len(engines), splits, hlc.UnixNano(), storeKnobs, ); err != nil { return nil, err } So I think you need to compare
replicaVersion
against the versions you've introduced and only usetrue
when the first one is active.This is probably also what you'll want to do if you want to end-to-end test the migration, perhaps similar to
cockroach/pkg/kv/kvserver/client_migration_test.go
Lines 232 to 234 in c52b01f
// TestMigrateWaitsForApplication checks to see that migrate commands wait to be // applied on all replicas before returning to the caller. func TestMigrateWaitsForApplication(t *testing.T) { (make a cluster at old version, make a scratch range, check that it doesn't track the term, invoke Migrate, check that it now tracks the term).
Done
pkg/storage/enginepb/mvcc3.proto, line 232 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Mention the version name so that we can clean up all comments when the migration is removed.
I would recommend doing this in all comments related to a migration.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
pkg/kv/kvserver/replica_raftstorage.go, line 1000 at r1 (raw file):
Previously, sumeerbhola wrote…
Added comment
Something off-putting about referencing reviewable comments; not sure it'll stand the test of time. Consider #75675 (review) as a more reliable permalink.
pkg/kv/kvserver/store_snapshot.go, line 907 at r1 (raw file):
Previously, sumeerbhola wrote…
Hardcoded this to true and added a comment.
Reset-quorum is not new in this release, so the hard-coding is technically incorrect & could cause issues if used against a mixed-version cluster. But I think this is still ok; reset-quorum
is experimental and it's just unlikely that it would be used here. But the correct thing to do, which you may want to do, is to have SendEmptySnapshot
return an error unless the version is active. Nobody will ever use it in the case in which it returns an error, but at least we know that if they do it won't cause problems but return an error.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 1014 at r1 (raw file):
Previously, sumeerbhola wrote…
I've added a comment specifying this option, but left the code the same.
Do you think we need this fallback anyway -- say the unsplit range has been migrated and then moved to this node, but the view of the cluster setting at this node is lagging, so it would be bad if the RHS did not populate the term.
No, fine as is. You could always "or" both conditions, but I really think this is fine, not sure why I even suggested the alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/migration/migrations/raft_applied_index_term_external_test.go, line 173 at r4 (raw file):
// [n1,bump-cluster-version] 3480 active cluster version setting is now 21.2-54 (up from 21.2-53(fence)) if repl.Version().Less(clusterversion.ByKey( clusterversion.AddRaftAppliedIndexTermMigration)) {
I am not sure why I was not able to specify PostAddRaftAppliedIndexTermMigration here, as noted in the code comment.
With this change this new migration test passes.
And I still need to investigate failures on TestTenantUpgrade, TestUpgradeHappensAfterMigrations, TestStartableJobMixedVersion.
5d9d720
to
22f3f04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/kv/kvserver/replica_raftstorage.go, line 1000 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Something off-putting about referencing reviewable comments; not sure it'll stand the test of time. Consider #75675 (review) as a more reliable permalink.
Done (I was trying to point to the specific comment instead of the aggregated version that the mapping from reviewable to github produces).
pkg/kv/kvserver/store_snapshot.go, line 907 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Reset-quorum is not new in this release, so the hard-coding is technically incorrect & could cause issues if used against a mixed-version cluster. But I think this is still ok;
reset-quorum
is experimental and it's just unlikely that it would be used here. But the correct thing to do, which you may want to do, is to haveSendEmptySnapshot
return an error unless the version is active. Nobody will ever use it in the case in which it returns an error, but at least we know that if they do it won't cause problems but return an error.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/migration/migrations/raft_applied_index_term_external_test.go, line 173 at r4 (raw file):
Previously, sumeerbhola wrote…
I am not sure why I was not able to specify PostAddRaftAppliedIndexTermMigration here, as noted in the code comment.
With this change this new migration test passes.And I still need to investigate failures on TestTenantUpgrade, TestUpgradeHappensAfterMigrations, TestStartableJobMixedVersion.
repl.Version()
is not the cluster version. It's the version of the highest Migrate()
command that has been executed on the range. So your migration will set it to -52, but there is no Range migration associated to -54 and so repl.Version()
will never reach that value. (If we did this differently, for each of the 27 versions in the release so far, we'd have to go through all of the ranges and bump the version 27 times)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r4, 5 of 5 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
failure of TestEnsureSpanConfigReconciliation is unrelated #75849 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/migration/migrations/raft_applied_index_term_external_test.go, line 67 at r6 (raw file):
// The in-memory ReplicaState maintained in Replica must be consistent // with that we are placing in RangeAppliedState. require.Equal(t, appliedState.RaftAppliedIndexTerm, appliedIndexTermInReplicaStruct)
this gets an error after the split (when running under stress after 100s of runs -- unfortunately I am unable to reproduce the race on my machine)
Error: Not equal:
expected: 0x5
actual : 0x6
Test: TestRaftAppliedIndexTermMigration
It looks like the term in the Replica
has been bumped up from raftInitialLogTerm
.
I've added a Replica.LockRaftMuForTesting()
method to prevent this race.
pkg/migration/migrations/raft_applied_index_term_external_test.go, line 193 at r6 (raw file):
require.NoError(t, err) t.Logf("split range into %s and %s", leftDesc.KeySpan(), rightDesc.KeySpan()) total, withTerm := getReplicaCounts(ctx, t, tc)
this call runs into a race condition (see the other comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)
pkg/migration/migrations/raft_applied_index_term_external_test.go, line 173 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
repl.Version()
is not the cluster version. It's the version of the highestMigrate()
command that has been executed on the range. So your migration will set it to -52, but there is no Range migration associated to -54 and sorepl.Version()
will never reach that value. (If we did this differently, for each of the 27 versions in the release so far, we'd have to go through all of the ranges and bump the version 27 times)
Thanks for clarifying!
bors r+ |
Build failed (retrying...): |
Merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r7, 7 of 7 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
pkg/kv/kvserver/replica.go, line 2006 at r8 (raw file):
} // LockRaftMuForTesting is for use only by tests that are trying to avoid a
Move to helpers_test.go
but we already have (*Replica).RaftLock
there so just use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @sumeerbhola from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
The same field is also introduced in ReplicaState, since ReplicaState is used in internal data-structures and when sending a state machine snapshot. The migration code uses a special unused term value in a ReplicatedEvalResult to signal to the state machine application machinery to start populating the term field. Fixes cockroachdb#75671 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @sumeerbhola, and @tbg)
pkg/kv/kvserver/replica.go, line 2006 at r8 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Move to
helpers_test.go
but we already have(*Replica).RaftLock
there so just use that.
I couldn't use that since test code from another package is not imported and there doesn't seem to be a good solution based on reading https://stackoverflow.com/questions/56404355/how-do-i-package-golang-test-helper-code
I didn't want to change that file to test_helpers.go
for only this one case, which will go away in the next release when the migration code is removed. I've added to the code comment here that this is only for testing code outside kvserver.
bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r9, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @sumeerbhola)
pkg/kv/kvserver/replica.go, line 2006 at r8 (raw file):
Previously, sumeerbhola wrote…
I couldn't use that since test code from another package is not imported and there doesn't seem to be a good solution based on reading https://stackoverflow.com/questions/56404355/how-do-i-package-golang-test-helper-code
I didn't want to change that file totest_helpers.go
for only this one case, which will go away in the next release when the migration code is removed. I've added to the code comment here that this is only for testing code outside kvserver.
Oh, didn't realize the caller wasn't in kvserver
. It is a bit awkward; the migration test could live in kvserver
, no? But I am fine merging as is, seeing how that code has a short shelf life anyway.
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
The same field is also introduced in ReplicaState, since ReplicaState
is used in internal data-structures and when sending a state machine
snapshot.
The migration code uses a special unused term value in a
ReplicatedEvalResult to signal to the state machine application
machinery to start populating the term field.
Fixes #75671
Release note: None