Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: introduce LockSpanSet to live alongside SpanSet and use it to declare lock spans #102008

Closed
arulajmani opened this issue Apr 21, 2023 · 0 comments
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@arulajmani
Copy link
Collaborator

arulajmani commented Apr 21, 2023

Background

Today, we use spanset.SpanSet to declare both latch spans and lock spans when sequencing requests. The structure of SpanSet was designed for latching, and makes sense in that context, where we have read and write latches, segmented between the global and local keyspace, and latches can be MVCC or non-MVCC.

For context, the structure of SpanSet looks like:

type SpanSet struct {
	spans           [NumSpanAccess][NumSpanScope][]Span
	allowUndeclared bool
}

type Span struct {
	roachpb.Span
	Timestamp hlc.Timestamp
}

Proposal
Lock spans make limited use of SpanSets, and as we're adding support for more types of lock strengths (see Shared locks), there is an opportunity (and requirement) to reimagine how lock spans should be represented. Specifically, instead of defining and resolving conflicts based on "reads" and "writes", we should be doing so using lock.Strength instead.

The other thing to note is that we never declare MVCC lock spans -- so the Timestamp field is always empty. We instead use the associated Request's timestamp information wherever we need it. It might also not make sense to differentiate between local and global locks as there aren't that many local locks to speak of. Given all this, we should stop using SpanSets to store lock spans. Instead, we could do something simpler like:

type LockSpanSet struct {
	spans          [lock.Strength][]roachpb.Span
	allowUndeclared bool
}

Touchpoints

  • There's a few touchpoints in the concurrency_control package around tryActiveWait
  • DefaultDeclareIsolatedKeys will need to correctly infer locking strength when constructing lockSpans. This is easy for read-only commands (Scan, RevScan, and Get) which already pass this information. However, for all other (write) requests that call into this function, we should be able to declare their associated lock spans with Intent locking strength.

Jira issue: CRDB-27215

Epic CRDB-26544

@arulajmani arulajmani added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team labels Apr 21, 2023
@arulajmani arulajmani self-assigned this Apr 21, 2023
@nvanbenschoten nvanbenschoten added the A-read-committed Related to the introduction of Read Committed label Apr 24, 2023
@nvanbenschoten nvanbenschoten changed the title concurrency: introduce LockSpanSet to live alongside SpanSet kv: introduce LockSpanSet to live alongside SpanSet Apr 24, 2023
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 27, 2023
Today we use `spanset.SpanSet` to declare both lock and latch spans.
However, `SpanSets` do not generalize to more lock strengths. This patch
introduces `LockSpanSets` in preperation for shared locks. We don't make
use of them in the `concurrency` package yet; that will happen in an
upcoming commit.

The main motivations for the new `LockSpanSets` structure are:
- A desire to store/declare lock spans using lock strengths, not span
access.
- There isn't a need to store MVCC spans -- lock strengths do not have
associated timestamps.
- There is no need to store global/local lock keys separately, like we
do for latches.

Furthermore, we only port over methods on `SpanSet` that lock spans made
use of in this patch.

Informs: cockroachdb#102008

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 27, 2023
Today we use `spanset.SpanSet` to declare both lock and latch spans.
However, `SpanSets` do not generalize to more lock strengths. This patch
introduces `LockSpanSets` in preperation for shared locks. We don't make
use of them in the `concurrency` package yet; that will happen in an
upcoming commit.

The main motivations for the new `LockSpanSets` structure are:
- A desire to store/declare lock spans using lock strengths, not span
access.
- There isn't a need to store MVCC spans -- lock strengths do not have
associated timestamps.
- There is no need to store global/local lock keys separately, like we
do for latches.

Furthermore, we only port over methods on `SpanSet` that lock spans made
use of in this patch.

Informs: cockroachdb#102008

Release note: None
craig bot pushed a commit that referenced this issue Apr 27, 2023
102396: lockspanset: introduce LockSpanSets r=nvanbenschoten a=arulajmani

Today we use `spanset.SpanSet` to declare both lock and latch spans. However, `SpanSets` do not generalize to more lock strengths. This patch introduces `LockSpanSets` in preperation for shared locks. We don't make use of them in the `concurrency` package yet; that will happen in an upcoming commit.

The main motivations for the new `LockSpanSets` structure are:
- A desire to store/declare lock spans using lock strengths, not span access.
- There isn't a need to store MVCC spans -- lock strengths do not have associated timestamps.
- There is no need to store global/local lock keys separately, like we do for latches.

Furthermore, we only port over methods on `SpanSet` that lock spans made use of in this patch.

Informs: #102008

Release note: None

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 27, 2023
This patch is entirely a refactor and does not change any
functionality. This is done in preparation for introducing
`LockSpanSets` to track lock spans, which do not make a distinction
between global and local keys (unlike `SpanSets`, which do).

The main changes here are in `lockTableImpl`, which actually stores
locks, and `lockTableGuardImpl` which snapshots the lock table. We
no longer make a distinction between locks on Local and Global keys
when storing them.  The majority of this diff is composed of test file
churn caused because of the printing changes to the lock table.

Informs cockroachdb#102008

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 28, 2023
This patch is entirely a refactor and does not change any
functionality. This is done in preparation for introducing
`LockSpanSets` to track lock spans, which do not make a distinction
between global and local keys (unlike `SpanSets`, which do).

The main changes here are in `lockTableImpl`, which actually stores
locks, and `lockTableGuardImpl` which snapshots the lock table. We
no longer make a distinction between locks on Local and Global keys
when storing them.  The majority of this diff is composed of test file
churn caused because of the printing changes to the lock table.

Informs cockroachdb#102008

Release note: None
craig bot pushed a commit that referenced this issue Apr 28, 2023
100181: kv: Use strict types for common fields r=erikgrinaker a=andrewbaptist

This PR introduces 3 new typed fields in mvcc.go:
RaftTerm, RaftIndex and LeaseSequence. These fields were previously just unit64 throughout the code and this made the code harder to read and risked incorrect conversions.

Epic: none

Release note: None

102407: kvserver: check PROSCRIBED lease status over UNUSABLE r=erikgrinaker,tbg a=pavelkalinnikov

The PROSCRIBED lease status, just like EXPIRED, puts a lease to a definitely invalid state. The UNUSABLE state (when request timestamp is in stasis period) is less of a clear cut: we still own the lease but callers may use or not use it depending on context.

For example, the closed timestamp side-transport ignores the UNUSABLE state (because we still own the lease), and takes it as usable for its purposes. Because of the order in which the checks were made, this has lead to a bug: a PROSCRIBED lease is reported as UNUSABLE during stasis periods, the closed timestamp side-transport then considers it usable, and updates closed timestamps when it shouldn't.

This commit fixes the bug by swapping the order of checks in the leaseStatus method. The order now goes from "hard" checks like EXPIRED and PROSCRIBED, to "softer" UNUSABLE, and (when the softness is put to the limit) VALID.

Fixes #98698
Fixes #99931
Fixes #100101
Epic: none

Release note (bug fix): a bug is fixed in closed timestamp updates within its side-transport. Previously, during asymmetric partitions, a node that transfers a lease away, and misses a liveness heartbeat, could then erroneously update the closed timestamp during the stasis period of its liveness. This could lead to closed timestamp invariant violation, and node crashes; in extreme cases this could lead to inconsistencies in read-only queries.

102503: concurrency: do not partition locks in the lock table by span scope r=nvanbenschoten a=arulajmani

This patch is entirely a refactor and does not change any functionality. This is done in preparation for introducing `LockSpanSets` to track lock spans, which do not make a distinction between global and local keys (unlike `SpanSets`, which do).

The main changes here are in `lockTableImpl`, which actually stores locks, and `lockTableGuardImpl` which snapshots the lock table. We no longer make a distinction between locks on Local and Global keys when storing them.  The majority of this diff is composed of test file churn caused because of the printing changes to the lock table.

Informs #102008

Release note: None

102590: sql,persistedsqlstats: prevent a deadlock during shutdown r=j82w a=knz

Fixes #102574.

Prior to this change, the coordination between the stats flusher
task (an async stopper task) and the activity flusher job was
performed using a two-step process:

- the stats persistence task offered to call a callback _function_
  every time a flush would complete.
- the job would _reconfigure the callback function_ on each iteration.
- the function was writing to a channel that was subsequently
  read by the job iteration body.

This approach was defective in 3 ways:

1. if the job iteration body would exit (e.g. due to a server drain)
   *after* it installed the callback fn, but *before* the stats flusher
   would read and call the callback fn, a window of time existed
   where a deadlock could occur:
   - the stats flusher retrieves the pointer to the caller fn
     but doesn't call it yet.
   - the job loop exits. From then on it will not read from the
     channel any more.
   - the stats flusher attempts to write to the channel.
     A deadlock occurs.

   (This was seen during testing. See #102574)

   The fix here is to always jointly `select` the write to the channel
   and also a read from the drain/stopper signals, to abort the
   channel operation if a shutdown is requested.

2. the stats flusher task was holding the mutex locked while
   performing the channel write. This is generally bad code hygiene
   as it forces the code maintainer to double-check whether the lock
   and channel operations don't mutually interlock.

   The fix is to use the mutex to retrieve the channel reference, and
   then write to the channel while the mutex is not held any more.

3. the API between the two was defining a *callback function* where
   really just a notification channel was needed.

   The fix here is to simplify the API.

Release note: None

Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Apr 28, 2023
Prior to this patch, requests would use the same mechanism
(`spanset.SpanSets`) to declare both latch and lock spans.
This patch changes the latter to use `lockspanset.LockSpanSets`
instead. This allows us to declare lock spans for a request with its
intended lock strengths in the future. This will in-turn enable us to
support multiple locking strengths in the concurrency package.

This patch does not change any functionality. To that end, requests
that would have previously declared access for `spanset.SpanReadOnly`
use `lock.None`; requests that would have previously declared
`spanset.SpanReadWrite` use `lock.Intent`. The `concurrency` package
validates lock spans with no other lock strength are supplied to it.
It does so both in the lock table waiter and when scanning the lock
table. We will relax these assertions as we build towards shared locks.

One thing to note is that I decided not to change the lock table tests
in this patch, to reduce test churn. We do so by translating lock
strength back to SpanAccess when printing out the guard state. This
will get fixed in a followup, along with the datadriven input.

Informs cockroachdb#102008

Release note: None
@arulajmani arulajmani changed the title kv: introduce LockSpanSet to live alongside SpanSet kv: introduce LockSpanSet to live alongside SpanSet and use it to declare lock spans Apr 29, 2023
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 1, 2023
Prior to this patch, requests would use the same mechanism
(`spanset.SpanSets`) to declare both latch and lock spans.
This patch changes the latter to use `lockspanset.LockSpanSets`
instead. This allows us to declare lock spans for a request with its
intended lock strengths in the future. This will in-turn enable us to
support multiple locking strengths in the concurrency package.

This patch does not change any functionality. To that end, requests
that would have previously declared access for `spanset.SpanReadOnly`
use `lock.None`; requests that would have previously declared
`spanset.SpanReadWrite` use `lock.Intent`. The `concurrency` package
validates lock spans with no other lock strength are supplied to it.
It does so both in the lock table waiter and when scanning the lock
table. We will relax these assertions as we build towards shared locks.

One thing to note is that I decided not to change the lock table tests
in this patch, to reduce test churn. We do so by translating lock
strength back to SpanAccess when printing out the guard state. This
will get fixed in a followup, along with the datadriven input.

Informs cockroachdb#102008

Release note: None
cameronnunez pushed a commit to cameronnunez/cockroach that referenced this issue May 2, 2023
This patch is entirely a refactor and does not change any
functionality. This is done in preparation for introducing
`LockSpanSets` to track lock spans, which do not make a distinction
between global and local keys (unlike `SpanSets`, which do).

The main changes here are in `lockTableImpl`, which actually stores
locks, and `lockTableGuardImpl` which snapshots the lock table. We
no longer make a distinction between locks on Local and Global keys
when storing them.  The majority of this diff is composed of test file
churn caused because of the printing changes to the lock table.

Informs cockroachdb#102008

Release note: None
craig bot pushed a commit that referenced this issue May 3, 2023
102647: kv: use LockSpanSets to declare lock spans  r=nvanbenschoten a=arulajmani

Prior to this patch, requests would use the same mechanism
(`spanset.SpanSets`) to declare both latch and lock spans.
This patch changes the latter to use `lockspanset.LockSpanSets`
instead. This allows us to declare lock spans for a request with its
intended lock strengths in the future. This will in-turn enable us to
support multiple locking strengths in the concurrency package.

This patch does not change any functionality. To that end, requests
that would have previously declared access for `spanset.SpanReadOnly`
use `lock.None`; requests that would have previously declared
`spanset.SpanReadWrite` use `lock.Intent`. The `concurrency` package
validates lock spans with no other lock strength are supplied to it.
It does so both in the lock table waiter and when scanning the lock
table. We will relax these assertions as we build towards shared locks.

One thing to note is that I decided not to change the lock table tests
in this patch, to reduce test churn. We do so by translating lock
strength back to SpanAccess when printing out the guard state. This
will get fixed in a followup, along with the datadriven input.

Informs #102008

Release note: None

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 4, 2023
Followup from cockroachdb#102647. In that patch, we decided to translate the
strength field on a lock table guard back to a `SpanAccess` to
defer some test churn. This patch addresses that TODO; it's entirely
mechanical.

Informs: cockroachdb#102008

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 4, 2023
In 5418acd we switched to declaring
lock spans using lock strength instead of span access; this patch
changes datadriven test input for lock table tests to do the same.

Closes cockroachdb#102008

Release note: None
craig bot pushed a commit that referenced this issue May 5, 2023
102775: concurrency: cleanup lock table datadriven tests r=nvanbenschoten a=arulajmani

See individual commits for details. 

Last bits of cleanup in support of #102008; the linked issue can be considered completed once this PR lands. 

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
@craig craig bot closed this as completed in 342f39b May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants