storage: mark replicas added to replica GC queue as destroyed #19353

m-schneider · 2017-10-18T21:32:38Z

Before when adding replicas to the GC queue, they weren't fully considered
destroyed. This lead to redirectOnOrAcquireLease waiting for the
replica to be GCed before returning a NotLeaseHolderError.

Now redirectOnOrAcquireLease will respond faster and anything depending
on that such as an RPC will not hang.

cockroach-teamcity · 2017-10-18T21:32:44Z

This change is

nvanbenschoten · 2017-10-19T18:28:17Z

Reviewed 5 of 5 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

pkg/storage/queue.go, line 602 at r1 (raw file):

	}

	if err, pending := repl.IsDestroyed(); err != nil && pending == false {

nit: !pending. Also, in conjunction with the comment on the pending flag, can we add one here to discuss why pending is taken into consideration?

pkg/storage/replica.go, line 290 at r1 (raw file):

		destroyed struct {
			destroyedErr error
			pending      bool

This would be easier to understand if it had a comment describing when a Replica's destroyed state is considered pending.

Also, it doesn't look like we ever set this back to false. Without being able to visualize the lifetime of this flag, I'm not easily able to convince myself that it's necessary.

pkg/storage/replica.go, line 1147 at r1 (raw file):

}

// IsDestroyed returns a non-nil error if the replica has been destroyed.

We should comment about the other return variable.

Comments from Reviewable

m-schneider · 2017-10-19T19:59:38Z

Review status: all files reviewed at latest revision, 3 unresolved discussions.

pkg/storage/queue.go, line 602 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: !pending. Also, in conjunction with the comment on the pending flag, can we add one here to discuss why pending is taken into consideration?

Good point!

pkg/storage/replica.go, line 290 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This would be easier to understand if it had a comment describing when a Replica's destroyed state is considered pending.

Also, it doesn't look like we ever set this back to false. Without being able to visualize the lifetime of this flag, I'm not easily able to convince myself that it's necessary.

The flag is necessary to avoid breaking a couple of things. queue.go would completely ignore a "destroyed" replica for any queue so it would never actually get GCed. Without the specific check on pending in withRaftGroupLocked the replica from the replica descriptor so the RangeLookupRequest in func (rgcq *replicaGCQueue) process would fail because the replica wouldn't be a current member of the range.

pkg/storage/replica.go, line 1147 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should comment about the other return variable.

Done.

Comments from Reviewable

bdarnell · 2017-10-19T21:34:13Z

Most of the uses of IsDestroyed do err != nil && !pending, so a replica in the pending state is not counted as destroyed in most cases (where is the place where we break out of the hanging lease request? I'm having trouble finding it). I think it might be better to make this a separate flag, distinct from the existing destroyed state, so it can be checked only in the place(s) where it matters.

Reviewed 5 of 5 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

pkg/storage/queue.go, line 602 at r2 (raw file):

	}

	if err, pending := repl.IsDestroyed(); err != nil && !pending {

Most queues don't want to process destroyed replicas - only the replica GC queue does. I think it might be better to have a special case here (a flag in queueConfig that is only set for replicaGCQueue) so that we don't send destroyed replicas to the other queues.

pkg/storage/replica.go, line 289 at r1 (raw file):

		// Has the replica been destroyed.
		destroyed struct {
			destroyedErr error

Call this err to avoid the repetition of destroyed.destroyedErr.

pkg/storage/replica.go, line 688 at r1 (raw file):

		return err
	}
	r.mu.destroyed.destroyedErr = pErr.GetDetail()

I think pending should probably be set here, to allow the replica GC queue to attempt to clean up these corrupted replicas. This would be an additional behavior change, but it seems like the right thing to do. The only problem is that the whole corrupted-replica mechanism is very under-tested and

pkg/storage/replica.go, line 690 at r2 (raw file):

		return err
	}
	if !r.mu.destroyed.pending {

This is an "init" method; when would r.mu.destroyed.pending be true here?

pkg/storage/replica.go, line 1287 at r2 (raw file):

			case LeaseState_ERROR:
				// Lease state couldn't be determined.
				log.Infof(ctx, "lease state couldn't be determined")

Did you mean to leave this change in?

pkg/storage/store.go, line 2148 at r2 (raw file):

		select {
		case s.snapshotApplySem <- struct{}{}:
			log.Infof(ctx, "SEMAPHORE AQUIRED")

Don't forget to remove these messages before merging.

Comments from Reviewable

tbg · 2017-10-24T11:31:51Z

@m-schneider the error (copied for context below) during tests are because in withRaftGroupLocked, you take into account the pending flag:

	if r.mu.destroyed.destroyedErr != nil && !r.mu.destroyed.pending {
		// Silently ignore all operations on destroyed replicas. We can't return an
		// error here as all errors returned from this method are considered fatal.
		return nil
	}

When the replica was actually destroyed, the code didn't actually reset the pending flag to false, and so the above snippet would recreate the in-memory Raft group even though its on-disk state had been removed (something that usually causes an explosion). After fixing that, the same problem occurred, though a little less often, and so I looked for another place that set pending to true. Of course there's only one, so that place also has to check whether the replica has already been gc'ed (it's convenient to instead avoid doing anything if there's already a destroyedErr).

With that change made, make stress PKG=./pkg/storage/ TESTS=TestReplicateAfterRemoveAndSplit runs for at least a couple hundred iters (conserving battery, so haven't checked further, but it failed pretty much immediately before).

Diff below:

diff --git a/pkg/storage/replica.go b/pkg/storage/replica.go
index f3c001cda..5a6677312 100644
--- a/pkg/storage/replica.go
+++ b/pkg/storage/replica.go
@@ -507,6 +507,7 @@ func (r *Replica) withRaftGroupLocked(
 	ctx := r.AnnotateCtx(context.TODO())
 
 	if r.mu.internalRaftGroup == nil {
+		log.Infof(context.Background(), "NEWRAFT: %+v", r.mu.destroyed)
 		raftGroup, err := raft.NewRawNode(newRaftConfig(
 			raft.Storage((*replicaRaftStorage)(r)),
 			uint64(r.mu.replicaID),
diff --git a/pkg/storage/store.go b/pkg/storage/store.go
index 11b868ef8..77e904479 100644
--- a/pkg/storage/store.go
+++ b/pkg/storage/store.go
@@ -2217,6 +2217,7 @@ func (s *Store) removeReplicaImpl(
 	rep.cancelPendingCommandsLocked()
 	rep.mu.internalRaftGroup = nil
 	rep.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(rep.RangeID)
+	rep.mu.destroyed.pending = false // set this before actually destroying
 	rep.mu.Unlock()
 	rep.readOnlyCmdMu.Unlock()
 
@@ -3291,8 +3292,12 @@ func (s *Store) HandleRaftResponse(ctx context.Context, resp *RaftMessageRespons
 			if err != nil {
 				log.Errorf(ctx, "unable to add to replica GC queue: %s", err)
 			} else if added {
-				repl.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(repl.RangeID)
-				repl.mu.destroyed.pending = true
+				repl.mu.Lock()
+				if repl.mu.destroyed.destroyedErr == nil {
+					repl.mu.destroyed.destroyedErr = roachpb.NewRangeNotFoundError(repl.RangeID)
+					repl.mu.destroyed.pending = true
+				}
+				repl.mu.Unlock()
 				log.Infof(ctx, "added to replica GC queue (peer suggestion)")
 			}
 		case *roachpb.StoreNotFoundError:

and here's the crash (without the diff):

E171024 11:17:16.490175 535 util/log/crash_reporting.go:119  [s3,r1/3:/M{in-ax}] a panic has occurred!
panic: applied(19) is out of range [prevApplied(10), committed(10)] [recovered]
	panic: applied(19) is out of range [prevApplied(10), committed(10)]

goroutine 535 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc420c10480, 0x6575d00, 0xc420d60390)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x131
panic(0x564e360, 0xc4202de4a0)
	/usr/local/Cellar/go/1.9/libexec/src/runtime/panic.go:491 +0x283
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc4202de470, 0x5939d8c, 0x3c, 0xc420cfa150, 0x3, 0x3)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:115 +0x1dc
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raftLog).appliedTo(0xc420d44d90, 0x13)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/log.go:202 +0x174
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.newRaft(0xc420db75d0, 0x476be4b)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:323 +0xc22
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.NewRawNode(0xc420db75d0, 0x0, 0x0, 0x0, 0xc420cc5680, 0xc4209485b0, 0x4014de8)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/rawnode.go:79 +0x71
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc420e26000, 0x0, 0x59aae40, 0xc4202d2d10, 0x100000001)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:510 +0x2e3
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).maybeInitializeRaftGroup(0xc420e26000, 0x6575d00, 0xc420afa5d0)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:1760 +0x185
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).Send(0xc420e26000, 0x6575d00, 0xc420afa5d0, 0x6b49d27d, 0xf2, 0x0, 0x0, 0x1, 0x0, 0x0, ...)
	/Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:1784 +0x136

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

pkg/storage/replica.go, line 688 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think pending should probably be set here, to allow the replica GC queue to attempt to clean up these corrupted replicas. This would be an additional behavior change, but it seems like the right thing to do. The only problem is that the whole corrupted-replica mechanism is very under-tested and

... and basically garbage. I think the only reason for keeping it at the moment is that it manages to persist the corruption so that you can't just restart the node and keep going. We're definitely not in a place where this corruption is handled in any way other than exploding. (Even worse, we sometimes keep running and it doesn't end well).

Comments from Reviewable

tbg · 2017-10-24T11:43:29Z

re: @bdarnell's comments, I tend to agree that it's a bit awkward. How's the following:

type destroyReason int

type destroyStatus struct {
    reason destroyReason
    err error
}

func (s destroyStatus) IsAlive() bool {
  return s.reason == destroyReasonAlive
}

const (
  destroyReasonAlive destroyReason = iota // `err` must be nil (Go has no enums :crying_cat_face: )
  destroyReasonCorrupted                  // supersedes the `corrupted` bool
  destroyReasonRemovalPending             // replaces `pending==true`
  destroyReasonRemoved                    // replaces `pending==false`
)

type replicaMu struct {
// ...
  destroyed destroyStatus
}

Besides, this is mostly a WIP to see if it actually translates into better chaos behavior, so if we're reasonably certain that it does what it should, I'd rather test drive it first and then get into the bike shedding.

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

Comments from Reviewable

tbg · 2017-10-24T11:55:34Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

pkg/storage/queue.go, line 602 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Most queues don't want to process destroyed replicas - only the replica GC queue does. I think it might be better to have a special case here (a flag in queueConfig that is only set for replicaGCQueue) so that we don't send destroyed replicas to the other queues.

This means adding a field processNonAlive (name 🚲 🏡 ) to queueConfig and using the field here. (All queues except the replica gc queue set it true, of course). Note that this would also mean that previously GC'ed replicas may be GC'ed again. It'd be good to avoid this from actually happening (i.e. the replicaGCQueue's process method would check the status and only actually GC if it's alive or marked for deletion but not previously GC'ed).

Comments from Reviewable

m-schneider · 2017-10-25T19:17:38Z

That looks much cleaner, I'll refactor and rerun the tests.

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions.

pkg/storage/queue.go, line 602 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This means adding a field processNonAlive (name 🚲 🏡 ) to queueConfig and using the field here. (All queues except the replica gc queue set it true, of course). Note that this would also mean that previously GC'ed replicas may be GC'ed again. It'd be good to avoid this from actually happening (i.e. the replicaGCQueue's process method would check the status and only actually GC if it's alive or marked for deletion but not previously GC'ed).

Done.

pkg/storage/replica.go, line 289 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Call this err to avoid the repetition of destroyed.destroyedErr.

Done.

pkg/storage/replica.go, line 688 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

... and basically garbage. I think the only reason for keeping it at the moment is that it manages to persist the corruption so that you can't just restart the node and keep going. We're definitely not in a place where this corruption is handled in any way other than exploding. (Even worse, we sometimes keep running and it doesn't end well).

I'm going to run some tests on a real cluster first, but then I'll set pending to true and see what happens!

pkg/storage/replica.go, line 690 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This is an "init" method; when would r.mu.destroyed.pending be true here?

Got rid of it!

pkg/storage/replica.go, line 1287 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Did you mean to leave this change in?

Gone!

pkg/storage/store.go, line 2148 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Don't forget to remove these messages before merging.

Deleted!

Comments from Reviewable

m-schneider · 2017-10-31T18:34:46Z

After migrating the experiment to an ephemeral cluster, this change seems to fix the drop we've been seeing. Here's a graph of a node in an ephemeral cluster before the change was applied.

You can see the dips when the restarting node rejoins the cluster.

Here's the cluster with this PR:

As you can see there are no dips when the restarted node comes back online at the 10, 30 and 50 minute marks.

nvanbenschoten · 2017-10-31T19:59:34Z

@m-schneider That improvement is great, nice job!

Reviewed 1 of 6 files at r3, 5 of 5 files at r4.
Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed.

pkg/storage/queue.go, line 195 at r4 (raw file):

	// purgatory is a gauge measuring current replica count in purgatory.
	purgatory *metric.Gauge
	// whether or not we want to process replicas that have been destroyed but not GCed.

nit: let's move up next to the other bool options (needsSystemConfig, acceptsUnsplitRanges, ...) and begin the comment with the name of the variable.

pkg/storage/replica.go, line 219 at r4 (raw file):

}

// The reason that a replica has been marked as destroyed.

I think the linter's going to yell about this comment.

pkg/storage/replica.go, line 235 at r4 (raw file):

	// Corrupted persistently (across process restarts) indicates whether the
	// replica has been corrupted.
	destroyReasonCorrupted

We can actually fail to persist the corruption error but we'll still be in this state (see maybeSetCorrupt). Because of that, this comment is a little misleading.

pkg/storage/replica.go, line 237 at r4 (raw file):

	destroyReasonCorrupted
	destroyReasonRemovalPending // replaces `pending==true`
	destroyReasonRemoved        // replaces `pending==false`

These comments won't mean much after this PR. Let's add explanations to each variant.

pkg/storage/replica.go, line 312 at r4 (raw file):

		// Protects all fields in the mu struct.
		syncutil.RWMutex
		// Has the replica been destroyed. Pending will be set when a replica is

Stale comment.

pkg/storage/replica.go, line 709 at r4 (raw file):

	}
	if r.mu.destroyed.reason != destroyReasonRemovalPending {
		r.mu.destroyed.err = pErr.GetDetail()

Is it possible for pErr != nil && pErr.GetDetail() == nil? If not, we should check if pErr is non-null and if not, set r.mu.destroyed.err and r.mu.destroyed.reason together. That would make this easier to understand.

pkg/storage/replica.go, line 1176 at r4 (raw file):

// IsDestroyed returns a non-nil error if the replica has been destroyed
// and the value of pending. If pending is true the replica has been added

Stale comment about pending

pkg/storage/replica.go, line 2964 at r4 (raw file):

	r.mu.Lock()
	if err := r.mu.destroyed.err; err != nil && !r.mu.destroyed.IsAlive() {

Same question as below. Seems like the error checks are no longer necessary.

pkg/storage/replica.go, line 3067 at r4 (raw file):

	// and the acquisition of Replica.mu. Failure to do so will leave pending
	// proposals that never get cleared.
	if err := r.mu.destroyed.err; err != nil && r.mu.destroyed.reason == destroyReasonRemoved {

Do we need to check if err is null? Will reason == destroyReasonRemoved ever be true when err is not set?

pkg/storage/replica_gc_queue.go, line 103 at r4 (raw file):

			pending:                  store.metrics.ReplicaGCQueuePending,
			processingNanos:          store.metrics.ReplicaGCQueueProcessingNanos,
			processDestroyedReplicas: true,

nit: we should move this up next to the other bool options.

pkg/storage/store.go, line 314 at r4 (raw file):

		rs.visited++
		repl.mu.RLock()
		destroyed := repl.mu.destroyed

Consider renaming this now that its type has changed.

pkg/storage/store.go, line 317 at r4 (raw file):

destroyed.IsAlive() || destroyed.reason == destroyReasonRemovalPending

Does this pair of states warrant a name and a corresponding method? IsNotRemoved or something?

pkg/storage/store.go, line 3293 at r4 (raw file):

				if repl.mu.destroyed.IsAlive() {
					repl.mu.destroyed.err = roachpb.NewRangeNotFoundError(repl.RangeID)
					repl.mu.destroyed.reason = destroyReasonRemovalPending

Since we're setting these two together everywhere now, it probably makes sense to encapsulate setting them into a method so that we don't miss setting one or the other somewhere. What do you think?

Comments from Reviewable

m-schneider · 2017-11-01T19:59:08Z

Review status: 2 of 6 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.

pkg/storage/queue.go, line 195 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: let's move up next to the other bool options (needsSystemConfig, acceptsUnsplitRanges, ...) and begin the comment with the name of the variable.

Done.

pkg/storage/replica.go, line 219 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think the linter's going to yell about this comment.

Fixed!

pkg/storage/replica.go, line 235 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can actually fail to persist the corruption error but we'll still be in this state (see maybeSetCorrupt). Because of that, this comment is a little misleading.

Done.

pkg/storage/replica.go, line 237 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

These comments won't mean much after this PR. Let's add explanations to each variant.

Done.

pkg/storage/replica.go, line 312 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Stale comment.

Done.

pkg/storage/replica.go, line 709 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is it possible for pErr != nil && pErr.GetDetail() == nil? If not, we should check if pErr is non-null and if not, set r.mu.destroyed.err and r.mu.destroyed.reason together. That would make this easier to understand.

Added a function to set the values together.

pkg/storage/replica.go, line 1176 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Stale comment about pending

Done.

pkg/storage/replica.go, line 2964 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same question as below. Seems like the error checks are no longer necessary.

Got rid of them.

pkg/storage/replica.go, line 3067 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to check if err is null? Will reason == destroyReasonRemoved ever be true when err is not set?

Got rid of the error check.

pkg/storage/replica_gc_queue.go, line 103 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: we should move this up next to the other bool options.

Done.

pkg/storage/store.go, line 314 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider renaming this now that its type has changed.

What do you think is a good name?

pkg/storage/store.go, line 317 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

destroyed.IsAlive() || destroyed.reason == destroyReasonRemovalPending

Does this pair of states warrant a name and a corresponding method? IsNotRemoved or something?

It's only used once, so I don't know if that's enough for adding a function.

pkg/storage/store.go, line 3293 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since we're setting these two together everywhere now, it probably makes sense to encapsulate setting them into a method so that we don't miss setting one or the other somewhere. What do you think?

Sounds good, I added a function.

Comments from Reviewable

nvanbenschoten · 2017-11-06T15:52:58Z

This is looking a lot better. Just nits at this point.

Reviewed 4 of 4 files at r5.
Review status: all files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.

pkg/storage/replica.go, line 1176 at r4 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Done.

Hmm, I don't see a change.

pkg/storage/replica.go, line 231 at r5 (raw file):

}

func (s *destroyStatus) SetDestroyed(err error, reason DestroyReason) {

nit: this stutters. Consider calling this simply Set

pkg/storage/replica.go, line 239 at r5 (raw file):

	// The Replica is alive.
	destroyReasonAlive DestroyReason = iota
	// Indicates whether the replica has been corrupted.

nit: The replica has been corrupted to remain constant with the other comments.

pkg/storage/replica.go, line 716 at r5 (raw file):

	}
	if r.mu.destroyed.reason != destroyReasonRemovalPending {
		if pErr.GetDetail() != nil {

nit: the following cuts down on a bit of repetition

if err := pErr.GetDetail(); err != nil {
    r.mu.destroyed.SetDestroyed(err, destroyReasonRemoved)
}

pkg/storage/replica.go, line 3073 at r5 (raw file):

	// and the acquisition of Replica.mu. Failure to do so will leave pending
	// proposals that never get cleared.
	if r.mu.destroyed.reason == destroyReasonRemoved {

Does this need to be performed under lock?

pkg/storage/store.go, line 314 at r4 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

What do you think is a good name?

Should we be calling this destroyedStatus everywhere?

pkg/storage/store.go, line 317 at r4 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

It's only used once, so I don't know if that's enough for adding a function.

SGTM

Comments from Reviewable

bdarnell · 2017-11-06T22:36:53Z

Reviewed 1 of 6 files at r3, 1 of 5 files at r4, 4 of 4 files at r5.
Review status: all files reviewed at latest revision, 12 unresolved discussions, all commit checks successful.

pkg/storage/replica.go, line 690 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Got rid of it!

Looks like it's back in a slightly different form?

pkg/storage/replica.go, line 236 at r5 (raw file):

}

const (

Nit: I'd put these constants immediately after the type DestroyReason int line.

pkg/storage/replica.go, line 511 at r5 (raw file):

	shouldCampaignOnCreation bool, f func(r *raft.RawNode) (unquiesceAndWakeLeader bool, _ error),
) error {
	if r.mu.destroyed.reason == destroyReasonRemoved {

This looks like a change in behavior - previously, corrupted replicas had a non-nil destroyed too. I think anything that was destroyed != nil should now be destroyed.IsAlive(). Any time we're not using IsAlive probably deserves a comment.

Comments from Reviewable

m-schneider · 2017-11-08T16:25:03Z

Review status: 2 of 6 files reviewed at latest revision, 12 unresolved discussions.

pkg/storage/replica.go, line 690 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Looks like it's back in a slightly different form?

Does that work better?

pkg/storage/replica.go, line 1176 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Hmm, I don't see a change.

Done.

pkg/storage/replica.go, line 231 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: this stutters. Consider calling this simply Set

Done.

pkg/storage/replica.go, line 236 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Nit: I'd put these constants immediately after the type DestroyReason int line.

Done.

pkg/storage/replica.go, line 239 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: The replica has been corrupted to remain constant with the other comments.

Done.

pkg/storage/replica.go, line 511 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This looks like a change in behavior - previously, corrupted replicas had a non-nil destroyed too. I think anything that was destroyed != nil should now be destroyed.IsAlive(). Any time we're not using IsAlive probably deserves a comment.

I added a named function, but IsAlive isn't preserving old behavior because destroyed can get set earlier now. Leaving it as IsAlive breaks some tests like Example_rebalancing.

pkg/storage/replica.go, line 716 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: the following cuts down on a bit of repetition
if err := pErr.GetDetail(); err != nil {
    r.mu.destroyed.SetDestroyed(err, destroyReasonRemoved)
}

Done.

pkg/storage/replica.go, line 3073 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be performed under lock?

It's locked above.

pkg/storage/store.go, line 314 at r4 (raw file):

destroyedStatus
Makes sense.

Comments from Reviewable

bdarnell · 2017-11-08T17:29:48Z

Reviewed 4 of 4 files at r6.
Review status: all files reviewed at latest revision, 11 unresolved discussions.

pkg/storage/replica.go, line 690 at r2 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Does that work better?

Yep, makes sense now.

pkg/storage/replica.go, line 511 at r5 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

I added a named function, but IsAlive isn't preserving old behavior because destroyed can get set earlier now. Leaving it as IsAlive breaks some tests like Example_rebalancing.

Hmm, that's kind of surprising. I'd have thought that even with destroyed being set earlier, it wasn't happening until after we no longer need the raft group. Anyway, this is fine if you uncomment it.

Comments from Reviewable

nvanbenschoten · 2017-11-08T17:51:17Z

Review status: all files reviewed at latest revision, 6 unresolved discussions.

pkg/storage/replica.go, line 223 at r6 (raw file):

const (
	// The Replica is alive.

nit: Replica is capital here, lowercase elsewhere.

pkg/storage/replica.go, line 517 at r6 (raw file):

	shouldCampaignOnCreation bool, f func(r *raft.RawNode) (unquiesceAndWakeLeader bool, _ error),
) error {
	//if r.mu.destroyed.RemovedOrCorrupt() {

Did you mean to comment this out?

Comments from Reviewable

tbg · 2017-11-08T18:34:02Z

I never see reason be reset. Maybe I'm missing something, but it's my understanding that a rapid remove-readd should first mark the replica as will-deleted but then upgrades it with a higher replicaID. Wouldn't the replica be in an undesired state at that point or is something resetting reason that I've missed?

Review status: all files reviewed at latest revision, 6 unresolved discussions.

Comments from Reviewable

m-schneider · 2017-11-08T21:08:09Z

Where was it reset before? I looked through a version of the code without my change and couldn't find where the error was set to null.

Comments from Reviewable

tbg · 2017-11-08T21:11:54Z

It wasn't reset before, but I also don't think it was ever set before when it wasn't permanently broken (corrupted) or definitely being GC'ed. You're setting the error earlier and, I think, making it necessary to reset the error when the replicaID changes (at least if the error is of the right type, you wouldn't want to wipe a corruption error).

Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

Comments from Reviewable

m-schneider · 2017-11-08T21:19:36Z

Got it, I actually check for slightly different statuses of the destroyed reason in different functions so things like proposals get use a weaker form of liveness to short circuit while other functions make sure that it's either been GCed or is corrupt.

Comments from Reviewable

m-schneider · 2017-11-08T22:29:42Z

Review status: 5 of 6 files reviewed at latest revision, 6 unresolved discussions.

pkg/storage/replica.go, line 511 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Hmm, that's kind of surprising. I'd have thought that even with destroyed being set earlier, it wasn't happening until after we no longer need the raft group. Anyway, this is fine if you uncomment it.

Done.

pkg/storage/replica.go, line 223 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Replica is capital here, lowercase elsewhere.

Done.

pkg/storage/replica.go, line 517 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you mean to comment this out?

Uncommented!

Comments from Reviewable

tbg · 2017-11-09T17:54:58Z

with some comments (including the replica reset). I think you can safely remove the [WIP] from the PR title and commit messages :-)

Reviewed 2 of 4 files at r5, 3 of 4 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

pkg/storage/replica.go, line 1004 at r7 (raw file):

	// our leader status and close the proposalQuota whenever the replica is
	// destroyed.
	if r.mu.destroyStatus.reason == destroyReasonRemoved {

Should this be RemovedOrCorrupt() to preserve the old behavior?

pkg/storage/replica.go, line 1135 at r7 (raw file):
Comment is slightly out of date. How about

// The range is either not fully initialized, has been destroyed, or is slated for removal (and thus likely ignored by its former Raft peers).

pkg/storage/replica.go, line 3074 at r7 (raw file):

	}

	// NB: We need to check Replica.mu.destroyed again in case the Replica has

destroyStatus

pkg/storage/replica.go, line 3078 at r7 (raw file):

	// and the acquisition of Replica.mu. Failure to do so will leave pending
	// proposals that never get cleared.
	if r.mu.destroyStatus.reason == destroyReasonRemoved {

this should mirror the initial check with !IsAlive.

pkg/storage/store.go, line 3290 at r7 (raw file):
This could use a comment:

// The replica will be garbage collected soon (we are sure since our replicaID is definitely too old), but in the meantime we already want to bounce all traffic from it. Note that the replica could be re-added with a higher replicaID, in which this error is cleared in setReplicaIDRaftMuLockedMuLocked.

This is also not done yet, but I think it needs to happen. In the method above, in the case in which the replicaID increases, we need code like

if repl.mu.destroyStatus.reason == destroyReasonRemovalPending {
  // An earlier incarnation of this replica was removed, but apparently it has been re-added
  // now, so reset the status.
  repl.mu.destroyStatus.err = nil
  repl.mu.destroyStatus.reason = destroyReasonAlive
}

Comments from Reviewable

Before when adding replicas to the GC queue, they weren't fully considered destroyed. This lead to redirectOnOrAcquireLease waiting for the replica to be GCed before returning a NotLeaseHolderError. Now redirectOnOrAcquireLease will respond faster and anything depending on that such as an RPC will not hang.

tbg · 2017-11-09T19:49:03Z

Reviewed 2 of 2 files at r8.
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.

pkg/storage/store.go, line 3293 at r8 (raw file):

				// since our replicaID is definitely too old), but in the meantime we
				// already want to bounce all traffic from it. Note that the replica
				// could be re-added with a higher replicaID, in which this error is

aw, I missed a word here (in which case), but it's not worth re-running CI.

Comments from Reviewable

tbg · 2017-11-09T19:49:33Z

LG! Let's 🛍 this 🏓

m-schneider · 2017-11-09T21:29:48Z

Review status: all files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.

pkg/storage/replica.go, line 1004 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Should this be RemovedOrCorrupt() to preserve the old behavior?

Done.

pkg/storage/replica.go, line 1135 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Comment is slightly out of date. How about

// The range is either not fully initialized, has been destroyed, or is slated for removal (and thus likely ignored by its former Raft peers).

Done.

pkg/storage/replica.go, line 3074 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

destroyStatus

Done.

pkg/storage/replica.go, line 3078 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

this should mirror the initial check with !IsAlive.

Done.

pkg/storage/store.go, line 3290 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This could use a comment:

// The replica will be garbage collected soon (we are sure since our replicaID is definitely too old), but in the meantime we already want to bounce all traffic from it. Note that the replica could be re-added with a higher replicaID, in which this error is cleared in setReplicaIDRaftMuLockedMuLocked.

This is also not done yet, but I think it needs to happen. In the method above, in the case in which the replicaID increases, we need code like
if repl.mu.destroyStatus.reason == destroyReasonRemovalPending {
  // An earlier incarnation of this replica was removed, but apparently it has been re-added
  // now, so reset the status.
  repl.mu.destroyStatus.err = nil
  repl.mu.destroyStatus.reason = destroyReasonAlive
}

Done.

Comments from Reviewable

m-schneider requested review from tbg and a team October 18, 2017 21:32

m-schneider force-pushed the gcleaseholderfix branch from ceda782 to 06aaba9 Compare October 18, 2017 21:38

m-schneider force-pushed the gcleaseholderfix branch 3 times, most recently from 09f219a to b1f626a Compare October 19, 2017 19:59

m-schneider force-pushed the gcleaseholderfix branch 2 times, most recently from a9b82de to 786e399 Compare October 19, 2017 20:22

m-schneider force-pushed the gcleaseholderfix branch 6 times, most recently from dc61bd4 to a98687e Compare October 25, 2017 19:17

m-schneider force-pushed the gcleaseholderfix branch from a98687e to 60b8421 Compare October 26, 2017 18:59

m-schneider mentioned this pull request Oct 30, 2017

stability: stuck requests after rebalancing-inducing downtime #19165

Closed

m-schneider force-pushed the gcleaseholderfix branch 2 times, most recently from c5fdb49 to 89a63a1 Compare October 31, 2017 18:15

m-schneider force-pushed the gcleaseholderfix branch 2 times, most recently from dbda70b to 56c4876 Compare October 31, 2017 20:42

m-schneider force-pushed the gcleaseholderfix branch 2 times, most recently from 150d813 to 91b86db Compare November 1, 2017 17:31

m-schneider force-pushed the gcleaseholderfix branch 2 times, most recently from 095732e to d92e360 Compare November 8, 2017 16:21

m-schneider force-pushed the gcleaseholderfix branch from d92e360 to 72fc130 Compare November 8, 2017 22:23

m-schneider changed the title ~~[WIP] storage: mark replicas added to replica GC queue as destroyed~~ storage: mark replicas added to replica GC queue as destroyed Nov 9, 2017

m-schneider force-pushed the gcleaseholderfix branch from 72fc130 to c07f3b0 Compare November 9, 2017 19:15

m-schneider merged commit e078763 into cockroachdb:master Nov 9, 2017

m-schneider deleted the gcleaseholderfix branch November 9, 2017 21:30

a-robinson mentioned this pull request Nov 10, 2017

storage: TestReplicateAfterRemoveAndSplit failed under stress #19962

Closed

m-schneider mentioned this pull request Apr 17, 2018

storage: Replica gets GC'ed immediately after preemptive snapshot gets applied #17198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: mark replicas added to replica GC queue as destroyed #19353

storage: mark replicas added to replica GC queue as destroyed #19353

m-schneider commented Oct 18, 2017

cockroach-teamcity commented Oct 18, 2017

nvanbenschoten commented Oct 19, 2017

m-schneider commented Oct 19, 2017

bdarnell commented Oct 19, 2017

tbg commented Oct 24, 2017

tbg commented Oct 24, 2017

tbg commented Oct 24, 2017

m-schneider commented Oct 25, 2017

m-schneider commented Oct 31, 2017

nvanbenschoten commented Oct 31, 2017

m-schneider commented Nov 1, 2017

nvanbenschoten commented Nov 6, 2017

bdarnell commented Nov 6, 2017

m-schneider commented Nov 8, 2017

bdarnell commented Nov 8, 2017

nvanbenschoten commented Nov 8, 2017

tbg commented Nov 8, 2017

m-schneider commented Nov 8, 2017

tbg commented Nov 8, 2017

m-schneider commented Nov 8, 2017

m-schneider commented Nov 8, 2017

tbg commented Nov 9, 2017

tbg commented Nov 9, 2017

tbg commented Nov 9, 2017

m-schneider commented Nov 9, 2017

storage: mark replicas added to replica GC queue as destroyed #19353

storage: mark replicas added to replica GC queue as destroyed #19353

Conversation

m-schneider commented Oct 18, 2017

cockroach-teamcity commented Oct 18, 2017

nvanbenschoten commented Oct 19, 2017

m-schneider commented Oct 19, 2017

bdarnell commented Oct 19, 2017

tbg commented Oct 24, 2017

tbg commented Oct 24, 2017

tbg commented Oct 24, 2017

m-schneider commented Oct 25, 2017

m-schneider commented Oct 31, 2017

nvanbenschoten commented Oct 31, 2017

m-schneider commented Nov 1, 2017

nvanbenschoten commented Nov 6, 2017

bdarnell commented Nov 6, 2017

m-schneider commented Nov 8, 2017

bdarnell commented Nov 8, 2017

nvanbenschoten commented Nov 8, 2017

tbg commented Nov 8, 2017

m-schneider commented Nov 8, 2017

tbg commented Nov 8, 2017

m-schneider commented Nov 8, 2017

m-schneider commented Nov 8, 2017

tbg commented Nov 9, 2017

tbg commented Nov 9, 2017

tbg commented Nov 9, 2017

m-schneider commented Nov 9, 2017