-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: mark replicas added to replica GC queue as destroyed #19353
Conversation
ceda782
to
06aaba9
Compare
Reviewed 5 of 5 files at r1. pkg/storage/queue.go, line 602 at r1 (raw file):
nit: pkg/storage/replica.go, line 290 at r1 (raw file):
This would be easier to understand if it had a comment describing when a Also, it doesn't look like we ever set this back to pkg/storage/replica.go, line 1147 at r1 (raw file):
We should comment about the other return variable. Comments from Reviewable |
09f219a
to
b1f626a
Compare
Review status: all files reviewed at latest revision, 3 unresolved discussions. pkg/storage/queue.go, line 602 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Good point! pkg/storage/replica.go, line 290 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The flag is necessary to avoid breaking a couple of things. queue.go would completely ignore a "destroyed" replica for any queue so it would never actually get GCed. Without the specific check on pending in withRaftGroupLocked the replica from the replica descriptor so the RangeLookupRequest in func (rgcq *replicaGCQueue) process would fail because the replica wouldn't be a current member of the range. pkg/storage/replica.go, line 1147 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
a9b82de
to
786e399
Compare
Most of the uses of Reviewed 5 of 5 files at r2. pkg/storage/queue.go, line 602 at r2 (raw file):
Most queues don't want to process destroyed replicas - only the replica GC queue does. I think it might be better to have a special case here (a flag in queueConfig that is only set for replicaGCQueue) so that we don't send destroyed replicas to the other queues. pkg/storage/replica.go, line 289 at r1 (raw file):
Call this pkg/storage/replica.go, line 688 at r1 (raw file):
I think pkg/storage/replica.go, line 690 at r2 (raw file):
This is an "init" method; when would pkg/storage/replica.go, line 1287 at r2 (raw file):
Did you mean to leave this change in? pkg/storage/store.go, line 2148 at r2 (raw file):
Don't forget to remove these messages before merging. Comments from Reviewable |
@m-schneider the error (copied for context below) during tests are because in if r.mu.destroyed.destroyedErr != nil && !r.mu.destroyed.pending {
// Silently ignore all operations on destroyed replicas. We can't return an
// error here as all errors returned from this method are considered fatal.
return nil
} When the replica was actually destroyed, the code didn't actually reset the pending flag to With that change made, Diff below: diff --git a/pkg/storage/replica.go b/pkg/storage/replica.go
index f3c001cda..5a6677312 100644
--- a/pkg/storage/replica.go
+++ b/pkg/storage/replica.go
@@ -507,6 +507,7 @@ func (r *Replica) withRaftGroupLocked(
ctx := r.AnnotateCtx(context.TODO())
if r.mu.internalRaftGroup == nil {
+ log.Infof(context.Background(), "NEWRAFT: %+v", r.mu.destroyed)
raftGroup, err := raft.NewRawNode(newRaftConfig(
raft.Storage((*replicaRaftStorage)(r)),
uint64(r.mu.replicaID),
diff --git a/pkg/storage/store.go b/pkg/storage/store.go
index 11b868ef8..77e904479 100644
--- a/pkg/storage/store.go
+++ b/pkg/storage/store.go
@@ -2217,6 +2217,7 @@ func (s *Store) removeReplicaImpl(
rep.cancelPendingCommandsLocked()
rep.mu.internalRaftGroup = nil
rep.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(rep.RangeID)
+ rep.mu.destroyed.pending = false // set this before actually destroying
rep.mu.Unlock()
rep.readOnlyCmdMu.Unlock()
@@ -3291,8 +3292,12 @@ func (s *Store) HandleRaftResponse(ctx context.Context, resp *RaftMessageRespons
if err != nil {
log.Errorf(ctx, "unable to add to replica GC queue: %s", err)
} else if added {
- repl.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(repl.RangeID)
- repl.mu.destroyed.pending = true
+ repl.mu.Lock()
+ if repl.mu.destroyed.destroyedErr == nil {
+ repl.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(repl.RangeID)
+ repl.mu.destroyed.pending = true
+ }
+ repl.mu.Unlock()
log.Infof(ctx, "added to replica GC queue (peer suggestion)")
}
case *roachpb.StoreNotFoundError:
and here's the crash (without the diff):
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 688 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
... and basically garbage. I think the only reason for keeping it at the moment is that it manages to persist the corruption so that you can't just restart the node and keep going. We're definitely not in a place where this corruption is handled in any way other than exploding. (Even worse, we sometimes keep running and it doesn't end well). Comments from Reviewable |
re: @bdarnell's comments, I tend to agree that it's a bit awkward. How's the following:
Besides, this is mostly a WIP to see if it actually translates into better chaos behavior, so if we're reasonably certain that it does what it should, I'd rather test drive it first and then get into the bike shedding. Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. Comments from Reviewable |
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. pkg/storage/queue.go, line 602 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This means adding a field Comments from Reviewable |
dc61bd4
to
a98687e
Compare
That looks much cleaner, I'll refactor and rerun the tests. Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions. pkg/storage/queue.go, line 602 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/replica.go, line 289 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 688 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I'm going to run some tests on a real cluster first, but then I'll set pending to true and see what happens! pkg/storage/replica.go, line 690 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Got rid of it! pkg/storage/replica.go, line 1287 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Gone! pkg/storage/store.go, line 2148 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Deleted! Comments from Reviewable |
a98687e
to
60b8421
Compare
c5fdb49
to
89a63a1
Compare
After migrating the experiment to an ephemeral cluster, this change seems to fix the drop we've been seeing. Here's a graph of a node in an ephemeral cluster before the change was applied. You can see the dips when the restarting node rejoins the cluster. Here's the cluster with this PR: As you can see there are no dips when the restarted node comes back online at the 10, 30 and 50 minute marks. |
@m-schneider That improvement is great, nice job! Reviewed 1 of 6 files at r3, 5 of 5 files at r4. pkg/storage/queue.go, line 195 at r4 (raw file):
nit: let's move up next to the other bool options (needsSystemConfig, acceptsUnsplitRanges, ...) and begin the comment with the name of the variable. pkg/storage/replica.go, line 219 at r4 (raw file):
I think the linter's going to yell about this comment. pkg/storage/replica.go, line 235 at r4 (raw file):
We can actually fail to persist the corruption error but we'll still be in this state (see pkg/storage/replica.go, line 237 at r4 (raw file):
These comments won't mean much after this PR. Let's add explanations to each variant. pkg/storage/replica.go, line 312 at r4 (raw file):
Stale comment. pkg/storage/replica.go, line 709 at r4 (raw file):
Is it possible for pkg/storage/replica.go, line 1176 at r4 (raw file):
Stale comment about pkg/storage/replica.go, line 2964 at r4 (raw file):
Same question as below. Seems like the error checks are no longer necessary. pkg/storage/replica.go, line 3067 at r4 (raw file):
Do we need to check if pkg/storage/replica_gc_queue.go, line 103 at r4 (raw file):
nit: we should move this up next to the other bool options. pkg/storage/store.go, line 314 at r4 (raw file):
Consider renaming this now that its type has changed. pkg/storage/store.go, line 317 at r4 (raw file):
Does this pair of states warrant a name and a corresponding method? pkg/storage/store.go, line 3293 at r4 (raw file):
Since we're setting these two together everywhere now, it probably makes sense to encapsulate setting them into a method so that we don't miss setting one or the other somewhere. What do you think? Comments from Reviewable |
dbda70b
to
56c4876
Compare
150d813
to
91b86db
Compare
Review status: 2 of 6 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. pkg/storage/queue.go, line 195 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 219 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Fixed! pkg/storage/replica.go, line 235 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 237 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 312 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 709 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Added a function to set the values together. pkg/storage/replica.go, line 1176 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 2964 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Got rid of them. pkg/storage/replica.go, line 3067 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Got rid of the error check. pkg/storage/replica_gc_queue.go, line 103 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/store.go, line 314 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What do you think is a good name? pkg/storage/store.go, line 317 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's only used once, so I don't know if that's enough for adding a function. pkg/storage/store.go, line 3293 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Sounds good, I added a function. Comments from Reviewable |
This is looking a lot better. Just nits at this point. Reviewed 4 of 4 files at r5. pkg/storage/replica.go, line 1176 at r4 (raw file): Previously, m-schneider (Masha Schneider) wrote…
Hmm, I don't see a change. pkg/storage/replica.go, line 231 at r5 (raw file):
nit: this stutters. Consider calling this simply pkg/storage/replica.go, line 239 at r5 (raw file):
nit: pkg/storage/replica.go, line 716 at r5 (raw file):
nit: the following cuts down on a bit of repetition
pkg/storage/replica.go, line 3073 at r5 (raw file):
Does this need to be performed under lock? pkg/storage/store.go, line 314 at r4 (raw file): Previously, m-schneider (Masha Schneider) wrote…
Should we be calling this pkg/storage/store.go, line 317 at r4 (raw file): Previously, m-schneider (Masha Schneider) wrote…
SGTM Comments from Reviewable |
Reviewed 1 of 6 files at r3, 1 of 5 files at r4, 4 of 4 files at r5. pkg/storage/replica.go, line 690 at r2 (raw file): Previously, m-schneider (Masha Schneider) wrote…
Looks like it's back in a slightly different form? pkg/storage/replica.go, line 236 at r5 (raw file):
Nit: I'd put these constants immediately after the pkg/storage/replica.go, line 511 at r5 (raw file):
This looks like a change in behavior - previously, corrupted replicas had a non-nil Comments from Reviewable |
095732e
to
d92e360
Compare
Review status: 2 of 6 files reviewed at latest revision, 12 unresolved discussions. pkg/storage/replica.go, line 690 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Does that work better? pkg/storage/replica.go, line 1176 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 231 at r5 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 236 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 239 at r5 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 511 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I added a named function, but IsAlive isn't preserving old behavior because destroyed can get set earlier now. Leaving it as IsAlive breaks some tests like Example_rebalancing. pkg/storage/replica.go, line 716 at r5 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 3073 at r5 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's locked above. pkg/storage/store.go, line 314 at r4 (raw file):
Comments from Reviewable |
Reviewed 4 of 4 files at r6. pkg/storage/replica.go, line 690 at r2 (raw file): Previously, m-schneider (Masha Schneider) wrote…
Yep, makes sense now. pkg/storage/replica.go, line 511 at r5 (raw file): Previously, m-schneider (Masha Schneider) wrote…
Hmm, that's kind of surprising. I'd have thought that even with destroyed being set earlier, it wasn't happening until after we no longer need the raft group. Anyway, this is fine if you uncomment it. Comments from Reviewable |
Review status: all files reviewed at latest revision, 6 unresolved discussions. pkg/storage/replica.go, line 223 at r6 (raw file):
nit: Replica is capital here, lowercase elsewhere. pkg/storage/replica.go, line 517 at r6 (raw file):
Did you mean to comment this out? Comments from Reviewable |
I never see Review status: all files reviewed at latest revision, 6 unresolved discussions. Comments from Reviewable |
Where was it reset before? I looked through a version of the code without my change and couldn't find where the error was set to null. Comments from Reviewable |
It wasn't reset before, but I also don't think it was ever set before when it wasn't permanently broken (corrupted) or definitely being GC'ed. You're setting the error earlier and, I think, making it necessary to reset the error when the replicaID changes (at least if the error is of the right type, you wouldn't want to wipe a corruption error). Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. Comments from Reviewable |
Got it, I actually check for slightly different statuses of the destroyed reason in different functions so things like proposals get use a weaker form of liveness to short circuit while other functions make sure that it's either been GCed or is corrupt. Comments from Reviewable |
d92e360
to
72fc130
Compare
Review status: 5 of 6 files reviewed at latest revision, 6 unresolved discussions. pkg/storage/replica.go, line 511 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 223 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/replica.go, line 517 at r6 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Uncommented! Comments from Reviewable |
with some comments (including the replica reset). I think you can safely remove the Reviewed 2 of 4 files at r5, 3 of 4 files at r6, 1 of 1 files at r7. pkg/storage/replica.go, line 1004 at r7 (raw file):
Should this be pkg/storage/replica.go, line 1135 at r7 (raw file):
pkg/storage/replica.go, line 3074 at r7 (raw file):
pkg/storage/replica.go, line 3078 at r7 (raw file):
this should mirror the initial check with pkg/storage/store.go, line 3290 at r7 (raw file):
This is also not done yet, but I think it needs to happen. In the method above, in the case in which the replicaID increases, we need code like if repl.mu.destroyStatus.reason == destroyReasonRemovalPending {
// An earlier incarnation of this replica was removed, but apparently it has been re-added
// now, so reset the status.
repl.mu.destroyStatus.err = nil
repl.mu.destroyStatus.reason = destroyReasonAlive
} Comments from Reviewable |
Before when adding replicas to the GC queue, they weren't fully considered destroyed. This lead to redirectOnOrAcquireLease waiting for the replica to be GCed before returning a NotLeaseHolderError. Now redirectOnOrAcquireLease will respond faster and anything depending on that such as an RPC will not hang.
72fc130
to
c07f3b0
Compare
Reviewed 2 of 2 files at r8. pkg/storage/store.go, line 3293 at r8 (raw file):
aw, I missed a word here ( Comments from Reviewable |
LG! Let's 🛍 this 🏓 |
Review status: all files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. pkg/storage/replica.go, line 1004 at r7 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/replica.go, line 1135 at r7 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/replica.go, line 3074 at r7 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/replica.go, line 3078 at r7 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/store.go, line 3290 at r7 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
Before when adding replicas to the GC queue, they weren't fully considered
destroyed. This lead to redirectOnOrAcquireLease waiting for the
replica to be GCed before returning a NotLeaseHolderError.
Now redirectOnOrAcquireLease will respond faster and anything depending
on that such as an RPC will not hang.