-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvcc: support idempotent transactional writes #26915
Comments
Sounds reasonable to me, although I'm concerned that CPut is used so often that it may not be an edge case. If we have to abort replayed CPuts anyway, is there enough benefit from making this change? |
Well we wouldn't need to abort all replayed CPuts, just those that reference keys who have already been overwritten by even larger sequence numbers. I think almost all uses of idempotence to start with will be in cases where the success of a single request is ambiguous, so we're just replaying a single write without there being a chance of a larger sequence number having already clobbered the old value (pending multiple writes to the same key in the same batch). I also don't think multiple writes to the same key in the same transaction is very common in general, although I may be missing a pattern that makes heavy use of doing just that. |
Oh, I was thinking that we wouldn't be able to retry CPuts at all since even one CPut has masked the original value, but I guess as long as the sequence number in the request matches what's on disk you can tell that the previous iteration succeeded. And yes, this should be pretty rare. |
You can't recover the old value, but is that necessary? Say you're an operation that needs to replay two CPuts on the same key from the same transaction. What could you need the value of the first one for? EIther the second CPut was fed the first result earlier (in which case the request has it) or the second CPut was "blind" and so all that matters is that you make sure that the first one definitely was applied. This calls for a bit of a better API for replaying commands (where you wouldn't get a result back, just a confirmation that it is definitely applied). This is all getting a bit too complicated for what we need, though. |
Depends on the command. For Put and CPut, we know that they either apply or have an error (and you can't currently continue after an error, so we know they applied if there is a higher sequence), so we could construct the requisite empty response. But other commands like Increment need to return the value that was written at the time. |
My point is that if you are applying a command and you see a higher sequence number, then it should always be a noop (instead of an error) because the client must have made sure that all of its previous sequence numbers to that key applied before the higher one. I think this is a bit out in the weeds since if a client is actually replaying a request that has a lower sequence number, it has failed to uphold that same obligation, and so when it happens there's a bug or nobody is listening on the other end. You're right that you can't always recover the response, but my point is that if you really have to, something is already wrong. |
There's been an issue with parallel commits (#37866 (comment)) that's worth keeping an eye out, namely what happens when the final batch in the txn manages to lay down all of the required intents and the staging record but then fails on one of the query intents prepended to the end transaction. This can happen if a concurrent process resolves the txn, finds that everything is in place, commits the txn and resolves some intents. The resolved kv pairs don't carry the txnid any more, so QueryIntent will fail. Similar issues could occur if an intent actually got laid down but DistSender receives some network error and the intent is a full value by the time we retry. It'll be interesting to run these cases. It's already a bit scary that we'd be laying new provisional intents under the ID of a committed txn under a higher epoch. We ought to prevent that in the first place, perhaps by staging the txn record first and only retrying the remainder of the batch in parallel. If staging works and the remainder works, great. If the remainder errors, we could try the remainder again but eventually we'll need to return an ambiguous result instead of the last error we see. If the staging doesn't work then it's ambiguous as well. |
We have marked this issue as stale because it has been inactive for |
We have marked this issue as stale because it has been inactive for |
Touches #24194.
Touches #16026.
We are currently extremely conservative about the kinds of requests that we allow to replay within a transaction without throwing an error. Whenever we detect a case where a transaction may have performed the same operation twice, we throw a
TransactionRetryError
with a reason ofRETRY_POSSIBLE_REPLAY
. There are three cases where this happens in practice:BeginTransaction
request that doesn't advance a transaction record's epochEndTransaction
runs into the write timestamp cacheNow that #25007 has been addressed and we have sane transactional sequence numbers, I think we should consider at least lifting the second restriction. We should make writes within a transaction idempotent whenever possible. We may also consider lifting the first restriction, although that seems less important (maybe related to #25437).
As a concrete example of where we would benefit from this change, I think it would allow us to avoid the current messy handling of
MixedSuccessError
byDistSender
, which prevents retries of a batch if any write in the batch succeeded. Instead, we could just retry the entire batch without concern.Another concrete example is related to the transactional pipelining proposal (#16026). Idempotent transactional writes would allow writes that fail and are caught by
QueryIntent
requests to simply be replayed harmlessly. This would simplify logic and prevent unnecessary retries.However, I do think there are some complications with making older writes at overwritten sequence numbers idempotent because we only maintain a single intent on a key even if a transaction has written to it multiple times. The problem here is that the write may be dependent on the current value of the key (i.e. a
CPut
) and on a replay we won't have the proper intent to evaluate it against. Regardless, this is an edge case where we can still return aRETRY_POSSIBLE_REPLAY
error. I don't think it needs to block progress here.Introducing general transactional idempotency, where an entire transaction can be replayed to produce the same result, is more tricky. It would require a persistent record of a transaction even after it has committed. This is something that the
node-txn-ids.md
tech note originally in #24194 (but now removed, see reviewable) discussed. There are some pretty large benefits to this approach, the biggest of which is probably that it allows other code to become much simpler as a result.cc. @tschottdorf @bdarnell
NB: on one hand I'd like to discuss the specific proposal of at least allowing some degree of idempotence within transactions. However, we surprisingly don't have a general tracking issue for transactional idempotency, so I figure this can probably serve as that as well.
Jira issue: CRDB-4980
The text was updated successfully, but these errors were encountered: