mvcc: support idempotent transactional writes #26915

nvanbenschoten · 2018-06-21T23:53:49Z

Touches #24194.
Touches #16026.

We are currently extremely conservative about the kinds of requests that we allow to replay within a transaction without throwing an error. Whenever we detect a case where a transaction may have performed the same operation twice, we throw a TransactionRetryError with a reason of RETRY_POSSIBLE_REPLAY. There are three cases where this happens in practice:

When a replica sees a BeginTransaction request that doesn't advance a transaction record's epoch
When a replica sees a write to an existing intent where the write does not increase the intent's epoch or sequence number (batch indexes were deprecated in client/kv: add sequence numbers to individual Requests #25728)
When an EndTransaction runs into the write timestamp cache

Now that #25007 has been addressed and we have sane transactional sequence numbers, I think we should consider at least lifting the second restriction. We should make writes within a transaction idempotent whenever possible. We may also consider lifting the first restriction, although that seems less important (maybe related to #25437).

As a concrete example of where we would benefit from this change, I think it would allow us to avoid the current messy handling of MixedSuccessError by DistSender, which prevents retries of a batch if any write in the batch succeeded. Instead, we could just retry the entire batch without concern.

Another concrete example is related to the transactional pipelining proposal (#16026). Idempotent transactional writes would allow writes that fail and are caught by QueryIntent requests to simply be replayed harmlessly. This would simplify logic and prevent unnecessary retries.

However, I do think there are some complications with making older writes at overwritten sequence numbers idempotent because we only maintain a single intent on a key even if a transaction has written to it multiple times. The problem here is that the write may be dependent on the current value of the key (i.e. a CPut) and on a replay we won't have the proper intent to evaluate it against. Regardless, this is an edge case where we can still return a RETRY_POSSIBLE_REPLAY error. I don't think it needs to block progress here.

Introducing general transactional idempotency, where an entire transaction can be replayed to produce the same result, is more tricky. It would require a persistent record of a transaction even after it has committed. This is something that the node-txn-ids.md tech note originally in #24194 (but now removed, see reviewable) discussed. There are some pretty large benefits to this approach, the biggest of which is probably that it allows other code to become much simpler as a result.

cc. @tschottdorf @bdarnell

NB: on one hand I'd like to discuss the specific proposal of at least allowing some degree of idempotence within transactions. However, we surprisingly don't have a general tracking issue for transactional idempotency, so I figure this can probably serve as that as well.

Jira issue: CRDB-4980

The text was updated successfully, but these errors were encountered:

bdarnell · 2018-06-26T03:56:09Z

Sounds reasonable to me, although I'm concerned that CPut is used so often that it may not be an edge case. If we have to abort replayed CPuts anyway, is there enough benefit from making this change?

nvanbenschoten · 2018-06-26T05:35:45Z

Well we wouldn't need to abort all replayed CPuts, just those that reference keys who have already been overwritten by even larger sequence numbers. I think almost all uses of idempotence to start with will be in cases where the success of a single request is ambiguous, so we're just replaying a single write without there being a chance of a larger sequence number having already clobbered the old value (pending multiple writes to the same key in the same batch). I also don't think multiple writes to the same key in the same transaction is very common in general, although I may be missing a pattern that makes heavy use of doing just that.

bdarnell · 2018-06-26T12:52:38Z

Oh, I was thinking that we wouldn't be able to retry CPuts at all since even one CPut has masked the original value, but I guess as long as the sequence number in the request matches what's on disk you can tell that the previous iteration succeeded. And yes, this should be pretty rare.

tbg · 2018-06-28T07:57:25Z

However, I do think there are some complications with making older writes at overwritten sequence numbers idempotent because we only maintain a single intent on a key even if a transaction has written to it multiple times. The problem here is that the write may be dependent on the current value of the key (i.e. a CPut) and on a replay we won't have the proper intent to evaluate it against. Regardless, this is an edge case where we can still return a RETRY_POSSIBLE_REPLAY error. I don't think it needs to block progress here.

You can't recover the old value, but is that necessary? Say you're an operation that needs to replay two CPuts on the same key from the same transaction. What could you need the value of the first one for? EIther the second CPut was fed the first result earlier (in which case the request has it) or the second CPut was "blind" and so all that matters is that you make sure that the first one definitely was applied. This calls for a bit of a better API for replaying commands (where you wouldn't get a result back, just a confirmation that it is definitely applied). This is all getting a bit too complicated for what we need, though.

bdarnell · 2018-06-28T12:37:35Z

You can't recover the old value, but is that necessary?

Depends on the command. For Put and CPut, we know that they either apply or have an error (and you can't currently continue after an error, so we know they applied if there is a higher sequence), so we could construct the requisite empty response. But other commands like Increment need to return the value that was written at the time.

tbg · 2018-06-28T13:17:02Z

My point is that if you are applying a command and you see a higher sequence number, then it should always be a noop (instead of an error) because the client must have made sure that all of its previous sequence numbers to that key applied before the higher one. I think this is a bit out in the weeds since if a client is actually replaying a request that has a lower sequence number, it has failed to uphold that same obligation, and so when it happens there's a bug or nobody is listening on the other end. You're right that you can't always recover the response, but my point is that if you really have to, something is already wrong.

tbg · 2019-05-28T20:09:51Z

There's been an issue with parallel commits (#37866 (comment)) that's worth keeping an eye out, namely what happens when the final batch in the txn manages to lay down all of the required intents and the staging record but then fails on one of the query intents prepended to the end transaction. This can happen if a concurrent process resolves the txn, finds that everything is in place, commits the txn and resolves some intents. The resolved kv pairs don't carry the txnid any more, so QueryIntent will fail. Similar issues could occur if an intent actually got laid down but DistSender receives some network error and the intent is a full value by the time we retry. It'll be interesting to run these cases. It's already a bit scary that we'd be laying new provisional intents under the ID of a committed txn under a higher epoch. We ought to prevent that in the first place, perhaps by staging the txn record first and only retrying the remainder of the batch in parallel. If staging works and the remainder works, great. If the remainder errors, we could try the remainder again but eventually we'll need to return an ambiguous result instead of the last error we see. If the staging doesn't work then it's ambiguous as well.

github-actions · 2021-06-06T02:16:30Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

github-actions · 2023-09-21T11:08:55Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. labels Jun 21, 2018

tbg added this to the 2.2 milestone Jul 22, 2018

andreimatei mentioned this issue Aug 8, 2018

storage: should reads enforce seq nums? #28289

Closed

nvanbenschoten mentioned this issue Aug 21, 2018

kv, storage: reduce frequency of RETRY_ASYNC_WRITE_FAILURE failures #28876

Closed

This was referenced Sep 25, 2018

storage: query hangs when using INSERT FROM ... SELECT on same table #28842

Closed

storage: allow multiple intents #5861

Open

petermattis removed this from the 2.2 milestone Oct 5, 2018

nvanbenschoten added A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. and removed A-coreperf labels Oct 12, 2018

tbg mentioned this issue May 28, 2019

storage: unexpected COMMITTED state, likely due to parallel commits #37866

Closed

andreimatei mentioned this issue May 1, 2020

kv: rolling back an implicitly committed txn should not be permitted #48301

Closed

andreimatei mentioned this issue Mar 10, 2021

sql: Transaction status API to resolve ambiguous errors #35170

Open

github-actions bot added the no-issue-activity label Jun 6, 2021

andreimatei removed the no-issue-activity label Jun 7, 2021

jlinder added the T-kv KV Team label Jun 16, 2021

github-actions bot added the no-issue-activity label Sep 21, 2023

github-actions bot added the X-stale label Oct 3, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023

exalate-issue-sync bot closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mvcc: support idempotent transactional writes #26915

mvcc: support idempotent transactional writes #26915

nvanbenschoten commented Jun 21, 2018 •

edited by cockroach-jira-scripts

Loading

bdarnell commented Jun 26, 2018

nvanbenschoten commented Jun 26, 2018

bdarnell commented Jun 26, 2018

tbg commented Jun 28, 2018

bdarnell commented Jun 28, 2018

tbg commented Jun 28, 2018

tbg commented May 28, 2019

github-actions bot commented Jun 6, 2021

github-actions bot commented Sep 21, 2023

mvcc: support idempotent transactional writes #26915

mvcc: support idempotent transactional writes #26915

Comments

nvanbenschoten commented Jun 21, 2018 • edited by cockroach-jira-scripts Loading

bdarnell commented Jun 26, 2018

nvanbenschoten commented Jun 26, 2018

bdarnell commented Jun 26, 2018

tbg commented Jun 28, 2018

bdarnell commented Jun 28, 2018

tbg commented Jun 28, 2018

tbg commented May 28, 2019

github-actions bot commented Jun 6, 2021

github-actions bot commented Sep 21, 2023

nvanbenschoten commented Jun 21, 2018 •

edited by cockroach-jira-scripts

Loading