Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: check PROSCRIBED lease status over UNUSABLE #102407

Merged
merged 2 commits into from
Apr 28, 2023

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Apr 27, 2023

The PROSCRIBED lease status, just like EXPIRED, puts a lease to a definitely invalid state. The UNUSABLE state (when request timestamp is in stasis period) is less of a clear cut: we still own the lease but callers may use or not use it depending on context.

For example, the closed timestamp side-transport ignores the UNUSABLE state (because we still own the lease), and takes it as usable for its purposes. Because of the order in which the checks were made, this has lead to a bug: a PROSCRIBED lease is reported as UNUSABLE during stasis periods, the closed timestamp side-transport then considers it usable, and updates closed timestamps when it shouldn't.

This commit fixes the bug by swapping the order of checks in the leaseStatus method. The order now goes from "hard" checks like EXPIRED and PROSCRIBED, to "softer" UNUSABLE, and (when the softness is put to the limit) VALID.

Fixes #98698
Fixes #99931
Fixes #100101
Epic: none

Release note (bug fix): a bug is fixed in closed timestamp updates within its side-transport. Previously, during asymmetric partitions, a node that transfers a lease away, and misses a liveness heartbeat, could then erroneously update the closed timestamp during the stasis period of its liveness. This could lead to closed timestamp invariant violation, and node crashes; in extreme cases this could lead to inconsistencies in read-only queries.

@pav-kv pav-kv requested a review from a team as a code owner April 27, 2023 09:23
@blathers-crl
Copy link

blathers-crl bot commented Apr 27, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor

some callers may treat it as invalid, and some may thinks it's still valid.

It isn't really considered valid as such, I think the crucial point is that it's treated as ours. A PROSCRIBED lease isn't still ours because we may have transferred it away, but an UNUSABLE lease is still ours for the interval even though we can't use it (noone else can have an overlapping lease).

This probably also deserves a test case for the precedence.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Apr 27, 2023

Yeah, adding test in the meantime. Wanted to put the fix for review ASAP.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo my comment above about valid/owned and a test.

pkg/kv/kvserver/replica_range_lease.go Outdated Show resolved Hide resolved
@erikgrinaker erikgrinaker added backport-22.1.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Apr 27, 2023
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 27, 2023

How does this interact with clock skew, btw? The stasis period exists because it's possible for someone else to believe their lease has already started while we also believe that ours hasn't ended because of clock skew. Can this also cause violations, such that we should exclude the UNUSABLE state from the side transport entirely, or is this handled by the closed timestamp protocol and/or the HLC?

@pav-kv
Copy link
Collaborator Author

pav-kv commented Apr 27, 2023

@erikgrinaker Is your question basically: why this comment? I don't know, it would be nice to explain in this comment why it's safe to close timestamps in the stasis.

I think the reason it's ok is that the closed timestamps are "logical". They fall between the lease start and expiration timestamp. They aren't concerned with clock skew? Stasis is there to make sure there aren't requests served around the lease transfer times, i.e. it makes sure the physical time ranges when two leases are used are non-intersecting too?

Upd: there might be a problem though. How about the following scenario?

  • Lease 1 ends at time T, lease 2 starts at time T.
  • Lease 1 enters stasis, and closes timestamp between T-skew and T.
  • In the meantime, lease is already active on another node which behind in time by "skew".
  • Lease 2 closes timestamp in the past compared to the already closed timestamp by lease 1.

OTOH, I think it's always true that:

  • For two consecutive leases, LAI1 <= LAI2.
  • For two consecutive leases, CT1 <= lease 1 end time <= lease 2 start time <= CT2.

So, two closed timestamps coming from 2 leaseholders never form a CT regression? Even if one of them is in stasis.

Lease timestamps, and closed timestamps are "logical". They don't have any conflicts etc. Clock skew and stasis period come into play when we need to map physical onto this logical time, i.e. when this layer interacts with the "requests" layer above.

@erikgrinaker
Copy link
Contributor

Yeah, I think it's the LAI that saves us, because presumably an unusable lease can't increment the LAI.

@erikgrinaker
Copy link
Contributor

Yeah, I think it's the LAI that saves us, because presumably an unusable lease can't increment the LAI.

Does it though? In the scenario we've seen, the old leaseholder must have been able to increment its LAI, despite the lease being in PROSCRIBED and then UNUSABLE, otherwise we wouldn't have seen the regression.

I think the coordination here happens either by a lease transfer revoking the current lease (which will/should immediately halt the closed ts transport from sending future updates), or by the next lease necessarily having a greater start time than our expiration time, where the LAI is communicated via Raft, and the side transport is prevented from sending further updates beyond the expiration time. So the problem is simply that we keep sending updates after we transferred the lease, which we fix here. So I don't think UNUSABLE is a problem here.

There's also this:

// Note that the stasis period doesn't affect closed timestamps; a request is
// only server under a particular closed timestamp if its read frontier is below
// the closed timestamp.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Apr 27, 2023

Does it though? In the scenario we've seen, the old leaseholder must have been able to increment its LAI, despite the lease being in PROSCRIBED and then UNUSABLE, otherwise we wouldn't have seen the regression.

The old leaseholder was sending the old LAI. But the timestamp was high, getting into the territory of the next lease.
The new leaseholder was sending new LAIs, and later timestamps (which is a correct behaviour).

@tbg
Copy link
Member

tbg commented Apr 28, 2023

I think the reason it's ok is that the closed timestamps are "logical". They fall between the lease start and expiration timestamp.

Are you sure that's true? I don't think closed timestamps are constrained to the by the lease start, as the lease plays no role here:

target := closedts.TargetForPolicy(
now,
maxClockOffset,
lagTargetDuration,
leadTargetOverride,
sideTransportCloseInterval,
pol,
)

We do make sure that the resulting timestamp isn't past the lease expiration,

if st.ClosedTimestampUpperBound().Less(target) {
res.FailReason = sidetransport.TargetOverLeaseExpiration
return res
}

but I don't see a corresponding check for the start time.

Assuming a fixed closed timestamp lag time L, I agree that continuing to close during stasis doesn't seem to be an issue. The largest timestamp that can be closed by the old leaseholder is endStasis-L and the new leaseholder can only close >= nextLeaseStart-L (with the same or larger LAI). So these can't invert, I think this is morally what you're saying with the start time, but nextLeaseStart-L is clearly "in some old lease".

But also, closed timestamp lag time is a cluster setting and propagates async, so can't the old leaseholder be closing out endStasis - L but the new leaseholder closes out nextLeaseStart - 10L (cluster setting set to 10x the original)? These can obviously invert...

@tbg
Copy link
Member

tbg commented Apr 28, 2023

I'm probably missing something in that last bit because shouldn't changing the cluster setting trigger the assertion even if no lease changes hands then? Since the leaseholder's closed timestamp will regress against itself.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Apr 28, 2023

@erikgrinaker Added a unit-test, and updated the PR desc.

The PROSCRIBED lease status, just like EXPIRED, puts a lease to a definitely
invalid state. The UNUSABLE state (when request timestamp is in stasis period)
is less of a clear cut: we still own the lease but callers may use or not use
it depending on context.

For example, the closed timestamp side-transport ignores the UNUSABLE state
(because we still own the lease), and takes it as usable for its purposes.
Because of the order in which the checks were made, this has lead to a bug: a
PROSCRIBED lease is reported as UNUSABLE during stasis periods, the closed
timestamp side-transport then considers it usable, and updates closed
timestamps when it shouldn't.

This commit fixes the bug by swapping the order of checks in the leaseStatus
method. The order now goes from "hard" checks like EXPIRED and PROSCRIBED, to
"softer" UNUSABLE, and (when the softness is put to the limit) VALID.

Fixes cockroachdb#98698
Fixes cockroachdb#99931
Fixes cockroachdb#100101
Epic: none

Release note (bug fix): a bug is fixed in closed timestamp updates within its
side-transport. Previously, during asymmetric partitions, a node that transfers
a lease away, and misses a liveness heartbeat, could then erroneously update
the closed timestamp during the stasis period of its liveness. This could lead
to closed timestamp invariant violation, and node crashes; in extreme cases
this could lead to inconsistencies in read-only queries.
Epic: none
Release note: none
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of test nits, but we can address those in a follow-up PR. Thanks for adding this test!

Comment on lines +47 to +55
ts := []hlc.ClockTimestamp{
{WallTime: 500, Logical: 0}, // before lease start
{WallTime: 1000, Logical: 0}, // lease start
{WallTime: 2000, Logical: 0}, // within lease
{WallTime: 2950, Logical: 0}, // within stasis
{WallTime: 3000, Logical: 0}, // lease expiration
{WallTime: 3500, Logical: 0}, // expired
{WallTime: 5000, Logical: 0}, // next expiration
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be good to name these. It's hard to read the integer indexes below.

{lease: epoLease, now: ts[2], liveness: oldLiveness, want: kvserverpb.LeaseState_ERROR,
wantErr: "node liveness info for n1 is stale"},
} {
t.Run("", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of comments above the test cases, consider adding a desc field and use that as the test name too. It's annoying to debug failures when you have to count to find the correct case.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Apr 28, 2023

TFTRs!

bors r=erikgrinaker,tbg

@craig
Copy link
Contributor

craig bot commented Apr 28, 2023

Build succeeded:

@craig craig bot merged commit 8203394 into cockroachdb:master Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
4 participants