Skip to content

Commit

Permalink
kvserver: remove unnecessary special casing of lease for MLAI
Browse files Browse the repository at this point in the history
We were previously "not" assigning a LeaseAppliedIndex to lease
request proposals. But we were doing so very half-heartedly: instead of
always assigning zero, we assigned "whatever was assigned to the
previous command". This meant that in practice, the first lease on a
range would get zero, and any subsequent leases would get some nonzero
numbers. This wasn't particularly principled and raises eyebrows. For
testing convenience and clarity it is helpful to assign a zero MLAI
to leases, which is what this commit does.

We then turn to the special-casing in refreshProposalsLocked.

`(*Replica).refreshProposalsLocked` previously refused to repropose
commands that had a zero `AppliedLeaseIndex` (LAI). This was done on
grounds of needing to provide replay protection for these requests.
Upon inspection it became clear that a zero MLAI could only ever apply
to lease requests; they are the only proposals with an exception, and as
we saw above it would not usually result in a zero, but could (and
especially in testing, where most leases are the first lease on their
respective range).

We [discussed] this internally and concluded that leases can in fact be
reproposed, since they have their own replay protection mediated through
the lease's sequence number. This is great since as stated above, we did
in fact repropose (some) leases and have for years.

This commit removes the special casing.

Fixes #74711.

As suggested by @nvanbenschoten I ran the `kvserver` tests with this
diff that reproposes every lease request:

```diff
diff --git a/pkg/kv/kvserver/replica_proposal_buf.go b/pkg/kv/kvserver/replica_proposal_buf.go
index 95765d8efd..1b66745866 100644
--- a/pkg/kv/kvserver/replica_proposal_buf.go
+++ b/pkg/kv/kvserver/replica_proposal_buf.go
@@ -576,6 +576,11 @@ func (b *propBuf) FlushLockedWithRaftGroup(
 			ents = append(ents, raftpb.Entry{
 				Data: p.encodedCommand,
 			})
+			if p.Request.IsLeaseRequest() {
+				ents = append(ents, raftpb.Entry{
+					Data: p.encodedCommand,
+				})
+			}
 		}
 	}
 	if firstErr != nil {

```

This inspired the subsequent commits.

[discussed]: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641559185021100

Release note: None
  • Loading branch information
tbg committed Jan 17, 2022
1 parent e84001d commit 852a80c
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 30 deletions.
6 changes: 6 additions & 0 deletions pkg/kv/kvserver/replica_application_decoder.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,12 @@ func (d *replicaDecoder) retrieveLocalProposals(ctx context.Context) (anyLocal b
// version of the proposal in the pipeline, so don't remove the
// proposal from the map. We expect this entry to be rejected by
// checkForcedErr.
//
// Note that lease proposals always use a MaxLeaseIndex of zero (since
// they have their own replay protection), so they always meet this
// criterion. While such proposals can be reproposed, only the first
// instance that gets applied matters and so removing the command is
// always what we want to happen.
cmd.raftCmd.MaxLeaseIndex == cmd.proposal.command.MaxLeaseIndex
if shouldRemove {
// Delete the proposal from the proposals map. There may be reproposals
Expand Down
21 changes: 15 additions & 6 deletions pkg/kv/kvserver/replica_application_state_machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ func checkForcedErr(
// lease but not Equivalent to each other. If these leases are
// proposed under that same third lease, neither will be able to
// detect whether the other has applied just by looking at the
// current lease sequence number because neither will will increment
// current lease sequence number because neither will increment
// the sequence number.
//
// This can lead to inversions in lease expiration timestamps if
Expand All @@ -262,11 +262,20 @@ func checkForcedErr(
leaseMismatch = !replicaState.Lease.Equivalent(requestedLease)
}

// This is a check to see if the lease we proposed this lease request against is the same
// lease that we're trying to update. We need to check proposal timestamps because
// extensions don't increment sequence numbers. Without this check a lease could
// be extended and then another lease proposed against the original lease would
// be applied over the extension.
// This is a check to see if the lease we proposed this lease request
// against is the same lease that we're trying to update. We need to check
// proposal timestamps because extensions don't increment sequence
// numbers. Without this check a lease could be extended and then another
// lease proposed against the original lease would be applied over the
// extension.
//
// This check also confers replay protection when the sequence number
// matches, as it ensures that only the first of duplicated proposal can
// apply, and the second will be rejected (since its PrevLeaseProposal
// refers to the original lease, and not itself).
//
// PrevLeaseProposal is always set. Its nullability dates back to the
// migration that introduced it.
if raftCmd.ReplicatedEvalResult.PrevLeaseProposal != nil &&
(*raftCmd.ReplicatedEvalResult.PrevLeaseProposal != *replicaState.Lease.ProposedTS) {
leaseMismatch = true
Expand Down
22 changes: 18 additions & 4 deletions pkg/kv/kvserver/replica_proposal_buf.go
Original file line number Diff line number Diff line change
Expand Up @@ -600,13 +600,27 @@ func (b *propBuf) allocateLAIAndClosedTimestampLocked(
ctx context.Context, p *ProposalData, closedTSTarget hlc.Timestamp,
) (uint64, hlc.Timestamp, error) {

// Request a new max lease applied index for any request that isn't itself
// a lease request. Lease requests don't need unique max lease index values
// because their max lease indexes are ignored. See checkForcedErr.
// Assign a LeaseAppliedIndex (see checkForcedErr). These provide replay
// protection.
//
// Proposals coming from lease requests (not transfers) have their own replay
// protection, via the lease sequence and the previous lease's proposal
// timestamp; this is necessary as lease requests are proposed while not
// holding the lease (and so the proposed does not know a valid LAI to use).
// They will not check the lease applied index proposed from followers). While
// it would be legal to still assign a LAI to lease requests, historically it
// has been mildly inconvenient in testing, and might belie the fact that
// LAI-related concepts just don't apply. Instead, we assign a zero LAI to
// lease proposals, with a condition that matches that used in
// checkForcedError to identify lease requests. Note that lease *transfers*
// are only ever proposed by leaseholders, and they use the LAI to prevent
// replays (though they could in principle also be handled like lease
// requests).
var lai uint64
if !p.Request.IsLeaseRequest() {
b.assignedLAI++
lai = b.assignedLAI
}
lai := b.assignedLAI

if filter := b.testing.leaseIndexFilter; filter != nil {
if override := filter(p); override != 0 {
Expand Down
8 changes: 6 additions & 2 deletions pkg/kv/kvserver/replica_proposal_buf_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,9 @@ func (pc proposalCreator) newPutProposal(ts hlc.Timestamp) *ProposalData {
func (pc proposalCreator) newLeaseProposal(lease roachpb.Lease) *ProposalData {
var ba roachpb.BatchRequest
ba.Add(&roachpb.RequestLeaseRequest{Lease: lease})
return pc.newProposal(ba)
prop := pc.newProposal(ba)
prop.command.ReplicatedEvalResult.IsLeaseRequest = true
return prop
}

func (pc proposalCreator) newProposal(ba roachpb.BatchRequest) *ProposalData {
Expand Down Expand Up @@ -311,8 +313,10 @@ func TestProposalBuffer(t *testing.T) {
for i, p := range proposals {
if i != leaseReqIdx {
lai++
require.Equal(t, lai, p.MaxLeaseIndex)
} else {
require.Zero(t, p.MaxLeaseIndex)
}
require.Equal(t, lai, p.MaxLeaseIndex)
}

// Flush the buffer repeatedly until its array shrinks.
Expand Down
21 changes: 3 additions & 18 deletions pkg/kv/kvserver/replica_raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ package kvserver

import (
"context"
"fmt"
"math/rand"
"sort"
"strings"
Expand Down Expand Up @@ -1125,30 +1124,16 @@ func (r *Replica) refreshProposalsLocked(
slowProposalCount++
}
}
// TODO(tbg): the enabled() call is a hack until we've figured out what to
// do about #74711. If leases are finished instead of reproposed, they can't
// ever trigger the breaker, which is bad as there usually isn't anything
// else around that will.
if p.command.MaxLeaseIndex == 0 && !r.breaker.enabled() {
// Commands without a MaxLeaseIndex cannot be reproposed, as they might
// apply twice. We also don't want to ask the proposer to retry these
// special commands.
r.cleanupFailedProposalLocked(p)
log.VEventf(p.ctx, 2, "refresh (reason: %s) returning AmbiguousResultError for command "+
"without MaxLeaseIndex: %v", reason, p.command)
p.finishApplication(ctx, proposalResult{Err: roachpb.NewError(
roachpb.NewAmbiguousResultError(
fmt.Sprintf("unknown status for command without MaxLeaseIndex "+
"at refreshProposalsLocked time (refresh reason: %s)", reason)))})
continue
}
switch reason {
case reasonSnapshotApplied:
// If we applied a snapshot, check the MaxLeaseIndexes of all
// pending commands to see if any are now prevented from
// applying, and if so make them return an ambiguous error. We
// can't tell at this point (which should be rare) whether they
// were included in the snapshot we received or not.
//
// NB: lease proposals have MaxLeaseIndex 0, so they are cleaned
// up here too.
if p.command.MaxLeaseIndex <= r.mu.state.LeaseAppliedIndex {
r.cleanupFailedProposalLocked(p)
log.Eventf(p.ctx, "retry proposal %x: %s", p.idKey, reason)
Expand Down

0 comments on commit 852a80c

Please sign in to comment.