-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: add ResolveTimestampRequest #73399
Comments
The request is nice because if the data spans multiple ranges, the client knows that. The downside of the request is that if there are multiple clients, then maybe they'd race. That racing could be fine, especially if they all use the same timestamp. Another approach not mentioned is to have the rangefeed itself ask for the range to get closed immediately after an event is published. This might well be the best approach because we can make sure there's only one such request per replica (there's one rangefeed processor at most per replica) and it's the most dynamic. Internally it'd probably end up using exactly this request though, so probably we need something like this regardless. |
The above was a little silly. The problem with the rangefeed issuing it is it doesn’t know if the client is watching more ranges in its span than just the current range. In order to have the event on one range lead to a checkpoint over the whole span, we'd either need to have the client send it or have the server side know about the whole span (or spans). The latter seems bad. |
This proposal is particularly fitting for cluster settings where we don't support writing to the table in a transaction. Though, I suppose, you could just as well set the closed timestamp interval on that table to 0. |
We have marked this issue as stale because it has been inactive for |
Is your feature request related to a problem? Please describe.
We have these
RangeFeed
s which send a stream of events and then periodically send checkpoints telling the client when they've seen all events over a span up to some timestamp. The checkpoints trail the present by some amount. They trail by a while (3s) for reasons that arguably relate to cockroach's lack of bufferred writes (#72614) and read pessimism (#52768, though we could do in-memory with broadcasted verification and 2PC like spanner).Given the fact that the closed timestamp doesn't track the present, it's easy to have a scenario where a write a timestamp
t2
commits and the event is sent to the watchingRangeFeed
, then a separate transaction commits several seconds later att1
. Until the timestamp is resolved, history is mutable. However, if a transactionalScan
operation occurs over that keyspan, then, for all intents and purposes other than theRangeFeed
, the timestamp is now resolved up to the timestamp of theScan
. That all happens via theTimestampCache
.Sometimes it'd be really nice if the committing of a transaction was immediately followed by the resolving of some key spans. In the multi-tenant zone config design (RFC #66348 hopefully will merge soon) we have an asynchronous task which reconciles changes to descriptors and zone configs into the system tenant. The reconciliation in the current implementation occurs only after all of the watched data has been checkpointed (this allows us to simplify a bunch of stuff).
It's not hard to imagine reasons why such sql statements which change these configurations would want to know when the reconciliation has actually happened. Here's some:
Describe the solution you'd like
This issue proposes that we add a new non-transactional KV request
ResolveTimestampRequest
which takes a span and a timestamp. The operation is a writing request which scans the entire span of any range it overlaps with (the reason for resolving the whole range is that we don't have a finer-grained notion of a closed timestamp, and that seems fine for my purposes). The request semantically would operate mostly like an MVCC scan that throws away all of the data that is operating at its request timestamp followed by a replicated command which moves the closed timestamp up to its request timestamp. The systems we have in place for concurrency control should take care of the rest of the semantics.The downstream affect of such a request is that all listening
RangeFeed
s will end up getting a checkpoint.Describe alternatives you've considered
We could alternatively move the desire to resolve timestamps into a zone config for ranges such that all writes to some of these ranges auto-resolves to the present. That seems worse and too tightly coupled.
Additional context
The proposal then is that we'd have the schema changes which intend to lead to changes to zone configs issue such a request immediately upon committing (can parallelize with the wait for version checks if you want) and then can expect reconciliation to occur promptly. We'll probably need to build a mechanism to determine the implied zone config changes of a sql statement to make any limits work happen anyway.
Jira issue: CRDB-11574
The text was updated successfully, but these errors were encountered: