Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc: support idempotent transactional writes #26915

Closed
nvanbenschoten opened this issue Jun 21, 2018 · 9 comments
Closed

mvcc: support idempotent transactional writes #26915

nvanbenschoten opened this issue Jun 21, 2018 · 9 comments
Labels
A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. no-issue-activity T-kv KV Team X-stale

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jun 21, 2018

Touches #24194.
Touches #16026.

We are currently extremely conservative about the kinds of requests that we allow to replay within a transaction without throwing an error. Whenever we detect a case where a transaction may have performed the same operation twice, we throw a TransactionRetryError with a reason of RETRY_POSSIBLE_REPLAY. There are three cases where this happens in practice:

  • When a replica sees a BeginTransaction request that doesn't advance a transaction record's epoch
  • When a replica sees a write to an existing intent where the write does not increase the intent's epoch or sequence number (batch indexes were deprecated in client/kv: add sequence numbers to individual Requests #25728)
  • When an EndTransaction runs into the write timestamp cache

Now that #25007 has been addressed and we have sane transactional sequence numbers, I think we should consider at least lifting the second restriction. We should make writes within a transaction idempotent whenever possible. We may also consider lifting the first restriction, although that seems less important (maybe related to #25437).

As a concrete example of where we would benefit from this change, I think it would allow us to avoid the current messy handling of MixedSuccessError by DistSender, which prevents retries of a batch if any write in the batch succeeded. Instead, we could just retry the entire batch without concern.

Another concrete example is related to the transactional pipelining proposal (#16026). Idempotent transactional writes would allow writes that fail and are caught by QueryIntent requests to simply be replayed harmlessly. This would simplify logic and prevent unnecessary retries.

However, I do think there are some complications with making older writes at overwritten sequence numbers idempotent because we only maintain a single intent on a key even if a transaction has written to it multiple times. The problem here is that the write may be dependent on the current value of the key (i.e. a CPut) and on a replay we won't have the proper intent to evaluate it against. Regardless, this is an edge case where we can still return a RETRY_POSSIBLE_REPLAY error. I don't think it needs to block progress here.

Introducing general transactional idempotency, where an entire transaction can be replayed to produce the same result, is more tricky. It would require a persistent record of a transaction even after it has committed. This is something that the node-txn-ids.md tech note originally in #24194 (but now removed, see reviewable) discussed. There are some pretty large benefits to this approach, the biggest of which is probably that it allows other code to become much simpler as a result.

cc. @tschottdorf @bdarnell

NB: on one hand I'd like to discuss the specific proposal of at least allowing some degree of idempotence within transactions. However, we surprisingly don't have a general tracking issue for transactional idempotency, so I figure this can probably serve as that as well.

Jira issue: CRDB-4980

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. labels Jun 21, 2018
@bdarnell
Copy link
Contributor

Sounds reasonable to me, although I'm concerned that CPut is used so often that it may not be an edge case. If we have to abort replayed CPuts anyway, is there enough benefit from making this change?

@nvanbenschoten
Copy link
Member Author

Well we wouldn't need to abort all replayed CPuts, just those that reference keys who have already been overwritten by even larger sequence numbers. I think almost all uses of idempotence to start with will be in cases where the success of a single request is ambiguous, so we're just replaying a single write without there being a chance of a larger sequence number having already clobbered the old value (pending multiple writes to the same key in the same batch). I also don't think multiple writes to the same key in the same transaction is very common in general, although I may be missing a pattern that makes heavy use of doing just that.

@bdarnell
Copy link
Contributor

Oh, I was thinking that we wouldn't be able to retry CPuts at all since even one CPut has masked the original value, but I guess as long as the sequence number in the request matches what's on disk you can tell that the previous iteration succeeded. And yes, this should be pretty rare.

@tbg
Copy link
Member

tbg commented Jun 28, 2018

However, I do think there are some complications with making older writes at overwritten sequence numbers idempotent because we only maintain a single intent on a key even if a transaction has written to it multiple times. The problem here is that the write may be dependent on the current value of the key (i.e. a CPut) and on a replay we won't have the proper intent to evaluate it against. Regardless, this is an edge case where we can still return a RETRY_POSSIBLE_REPLAY error. I don't think it needs to block progress here.

You can't recover the old value, but is that necessary? Say you're an operation that needs to replay two CPuts on the same key from the same transaction. What could you need the value of the first one for? EIther the second CPut was fed the first result earlier (in which case the request has it) or the second CPut was "blind" and so all that matters is that you make sure that the first one definitely was applied. This calls for a bit of a better API for replaying commands (where you wouldn't get a result back, just a confirmation that it is definitely applied). This is all getting a bit too complicated for what we need, though.

@bdarnell
Copy link
Contributor

You can't recover the old value, but is that necessary?

Depends on the command. For Put and CPut, we know that they either apply or have an error (and you can't currently continue after an error, so we know they applied if there is a higher sequence), so we could construct the requisite empty response. But other commands like Increment need to return the value that was written at the time.

@tbg
Copy link
Member

tbg commented Jun 28, 2018

My point is that if you are applying a command and you see a higher sequence number, then it should always be a noop (instead of an error) because the client must have made sure that all of its previous sequence numbers to that key applied before the higher one. I think this is a bit out in the weeds since if a client is actually replaying a request that has a lower sequence number, it has failed to uphold that same obligation, and so when it happens there's a bug or nobody is listening on the other end. You're right that you can't always recover the response, but my point is that if you really have to, something is already wrong.

@tbg tbg added this to the 2.2 milestone Jul 22, 2018
@tbg tbg added A-coreperf and removed A-disaster-recovery A-kv-transactions Relating to MVCC and the transactional model. A-kv-distribution Relating to rebalancing and leasing. A-kv-client Relating to the KV client and the KV interface. A-storage Relating to our storage engine (Pebble) on-disk storage. A-kv-replication Relating to Raft, consensus, and coordination. labels Jul 31, 2018
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
@nvanbenschoten nvanbenschoten added A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. and removed A-coreperf labels Oct 12, 2018
@tbg
Copy link
Member

tbg commented May 28, 2019

There's been an issue with parallel commits (#37866 (comment)) that's worth keeping an eye out, namely what happens when the final batch in the txn manages to lay down all of the required intents and the staging record but then fails on one of the query intents prepended to the end transaction. This can happen if a concurrent process resolves the txn, finds that everything is in place, commits the txn and resolves some intents. The resolved kv pairs don't carry the txnid any more, so QueryIntent will fail. Similar issues could occur if an intent actually got laid down but DistSender receives some network error and the intent is a full value by the time we retry. It'll be interesting to run these cases. It's already a bit scary that we'd be laying new provisional intents under the ID of a committed txn under a higher epoch. We ought to prevent that in the first place, perhaps by staging the txn record first and only retrying the remainder of the batch in parallel. If staging works and the remainder works, great. If the remainder errors, we could try the remainder again but eventually we'll need to return an ambiguous result instead of the last error we see. If the staging doesn't work then it's ambiguous as well.

@github-actions
Copy link

github-actions bot commented Jun 6, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. no-issue-activity T-kv KV Team X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants