Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: add ResolveTimestampRequest #73399

Closed
ajwerner opened this issue Dec 2, 2021 · 4 comments
Closed

kv: add ResolveTimestampRequest #73399

ajwerner opened this issue Dec 2, 2021 · 4 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-kv KV Team X-stale

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Dec 2, 2021

Is your feature request related to a problem? Please describe.

We have these RangeFeeds which send a stream of events and then periodically send checkpoints telling the client when they've seen all events over a span up to some timestamp. The checkpoints trail the present by some amount. They trail by a while (3s) for reasons that arguably relate to cockroach's lack of bufferred writes (#72614) and read pessimism (#52768, though we could do in-memory with broadcasted verification and 2PC like spanner).

Given the fact that the closed timestamp doesn't track the present, it's easy to have a scenario where a write a timestamp t2 commits and the event is sent to the watching RangeFeed, then a separate transaction commits several seconds later at t1. Until the timestamp is resolved, history is mutable. However, if a transactional Scan operation occurs over that keyspan, then, for all intents and purposes other than the RangeFeed, the timestamp is now resolved up to the timestamp of the Scan. That all happens via the TimestampCache.

Sometimes it'd be really nice if the committing of a transaction was immediately followed by the resolving of some key spans. In the multi-tenant zone config design (RFC #66348 hopefully will merge soon) we have an asynchronous task which reconciles changes to descriptors and zone configs into the system tenant. The reconciliation in the current implementation occurs only after all of the watched data has been checkpointed (this allows us to simplify a bunch of stuff).

It's not hard to imagine reasons why such sql statements which change these configurations would want to know when the reconciliation has actually happened. Here's some:

  1. In a serverless setting, there's a risk that the pod will scale down before reconciliation happens.
  2. If we're trying to protect data using protected timestamps, invariants all come from when the protected timestamp actually makes it to the host cluster. If the operation issued by the client has no direct relationship to the reconciliation, it's hard to say much of anything about protected timestamps actually working.

Describe the solution you'd like

This issue proposes that we add a new non-transactional KV request ResolveTimestampRequest which takes a span and a timestamp. The operation is a writing request which scans the entire span of any range it overlaps with (the reason for resolving the whole range is that we don't have a finer-grained notion of a closed timestamp, and that seems fine for my purposes). The request semantically would operate mostly like an MVCC scan that throws away all of the data that is operating at its request timestamp followed by a replicated command which moves the closed timestamp up to its request timestamp. The systems we have in place for concurrency control should take care of the rest of the semantics.

The downstream affect of such a request is that all listening RangeFeeds will end up getting a checkpoint.

Describe alternatives you've considered

We could alternatively move the desire to resolve timestamps into a zone config for ranges such that all writes to some of these ranges auto-resolves to the present. That seems worse and too tightly coupled.

Additional context

The proposal then is that we'd have the schema changes which intend to lead to changes to zone configs issue such a request immediately upon committing (can parallelize with the wait for version checks if you want) and then can expect reconciliation to occur promptly. We'll probably need to build a mechanism to determine the implied zone config changes of a sql statement to make any limits work happen anyway.

Jira issue: CRDB-11574

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 2, 2021
@blathers-crl blathers-crl bot added the T-kv KV Team label Dec 2, 2021
@ajwerner
Copy link
Contributor Author

ajwerner commented Feb 2, 2022

The request is nice because if the data spans multiple ranges, the client knows that. The downside of the request is that if there are multiple clients, then maybe they'd race. That racing could be fine, especially if they all use the same timestamp. Another approach not mentioned is to have the rangefeed itself ask for the range to get closed immediately after an event is published. This might well be the best approach because we can make sure there's only one such request per replica (there's one rangefeed processor at most per replica) and it's the most dynamic. Internally it'd probably end up using exactly this request though, so probably we need something like this regardless.

@ajwerner
Copy link
Contributor Author

ajwerner commented Feb 2, 2022

The above was a little silly. The problem with the rangefeed issuing it is it doesn’t know if the client is watching more ranges in its span than just the current range. In order to have the event on one range lead to a checkpoint over the whole span, we'd either need to have the client send it or have the server side know about the whole span (or spans). The latter seems bad.

@ajwerner
Copy link
Contributor Author

This proposal is particularly fitting for cluster settings where we don't support writing to the table in a transaction. Though, I suppose, you could just as well set the closed timestamp interval on that table to 0.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-kv KV Team X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

1 participant