Spanner overlapping transactions don't fail when they should? #2118

jgeewax · 2017-03-21T14:39:32Z

My understanding is that if you read a cell (say the name column from a specific row in an employees table) in two different transactions, then commit a change to that cell in one of them, then a subsequent commit to that same cell from the other transaction should fail. Is this correct?

(I'm pulling this info from our docs, and making sure that these are not 'blind writes' by doing a read first -- see https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance)

If so, our client library doesn't appear to do that...?

var gcloud = require('google-cloud')({ projectId: 'jjg-cloud-research' });
var spanner = gcloud.spanner();
var instance = spanner.instance('test-instance');
var database = instance.database('test-database', {max: 5});
var table = database.table('employees');

var nameQuery = {keys: ['1'], columns: ['name']};

Promise.all([database.runTransaction(), database.runTransaction()]).then(
  function (txns) {
    var txn1 = txns[0][0], txn2 = txns[1][0];
    var txn = {1: txns[0][0], 2: txns[1][0]};

    function readNameFromTransaction(number) {
      return function () {
        return txn[number].read('employees', nameQuery).then(function (results) {
          console.log('txn' + number + ':', results[0][0]);
        });
      };
    }

    function changeNameFromTransaction(number) {
      return function () {
        txn[number].update('employees', {
          employee_id: '1',
          name: 'Updated by txn' + number
        });
        return txn[number].commit().then(function (results) {
          console.log('txn' + number + ':', results);
        });
      }
    }

    function readNameFromTable() {
      return table.read(nameQuery).then(function (results) {
        console.log('table:', results[0][0]);
      });
    }

    readNameFromTable()
      .then(readNameFromTransaction(1))
      .then(readNameFromTransaction(2))
      .then(changeNameFromTransaction(1))
      .then(readNameFromTable)
      .then(changeNameFromTransaction(2))
      .then(readNameFromTable)
      .catch(console.log);
  }
);

The output from this script looks totally successful:

table: [ { name: 'name', value: 'Updated by txn2' } ]
txn1: [ { name: 'name', value: 'Updated by txn2' } ]
txn2: [ { name: 'name', value: 'Updated by txn2' } ]
txn1: [ { commitTimestamp: { seconds: '1490106991', nanos: 173288000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { commitTimestamp: { seconds: '1490106991', nanos: 269254000 } } ]
table: [ { name: 'name', value: 'Updated by txn2' } ]

I expected that txn2 commit result to be a failure because the data changed out from under us... Is my understanding wrong? or the client library wrong? or a bug in my code?

/cc @callmehiphop

The text was updated successfully, but these errors were encountered:

jgeewax · 2017-03-21T15:10:55Z

OK. I've discovered that TransactionRequest.prototype.createReadStream requests do not include the transaction ID ... this is super duper bad :(

in transaction-request.js, starting line 215:

  if (self.id) {
    reqOpts.transaction = { id: self.id };
  }

  function makeRequest(resumeToken) {

Adding those 3 lines partly fixes the problem. I no longer have it blasting out writes (aka, Spanner can do the right thing), but I do not get an error in the way that I should. Things just crash... That could be my code though. New output below:

$ node test.js
table: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { commitTimestamp: { seconds: '1490109209', nanos: 785675000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
$ echo $?
0

jgeewax · 2017-03-21T15:30:12Z

OK -- after further digging, our retry logic appears to be broken. If I change the commit() method to force no retries (shouldRetry = 0), I get the following output:

table: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { commitTimestamp: { seconds: '1490110158', nanos: 928096000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
{ Error: Transaction was aborted. It was wounded by a higher priority transaction due to conflict on key [1], column name in table employees.
    at /usr/local/google/home/jjg/node_modules/google-cloud/node_modules/grpc/src/node/src/client.js:434:17
  code: 10,
  metadata: 
   Metadata {
     _internal_repr: 
      { 'google.rpc.debuginfo-bin': [Object],
        'google.rpc.retryinfo-bin': [Object] } },
  note: 'Exception occurred in retry method that was not classified as transient' }
table: [ { name: 'name', value: 'Updated by txn1' } ]

Did the spec say that we should retry conflicts like this? I have a hard time buying that the right thing is to just re-run the same mutations we had queued up. For example, what if the change was adding $100 to a bank balance? Wouldn't we want to re-run all of the logic (get balance, add $100, save new balance) and not just re-run the one mutation (set balance = $200) ?

To clarify, I think if we want retries to work as expected, we need to have all of the code that would need to be retried in a chunk -- not just the reads/writes that were called on the mutation (txn.read, txn.insert, etc). This would lead to something more like...

database.runInTransaction(function (txn) {
  txn.read(query).then(function (results) {
    txn.insert(....);
    return txn.read(otherQuery);
  }).then(function () {
    txn.insert(...);
    return txn.commit();
  });
});

And on retry, we take that entire callback and run it again. With the example that I have above with promises... I have no idea how to pick out only the parts that involve txn1 and retry only those....

And we may want to try to stop people from nesting transactions or ensure that we don't retry nested transactions. If we retry something that has two transactions that conflict, like my example, not only will the second to commit always fail, but we may end with the first one committing several times (which, if it "doubles a bank balance") could be a very bad thing.

stephenplusplus · 2017-03-21T17:29:54Z

cc @vkedia for any thoughts or prior art on the correct way to handle this scenario.

vkedia · 2017-03-21T17:40:35Z

@jgeewax Yes you are right that we should not just retry the buffered mutations. We need to retry all the operations that were done on the transactions. So essentially we should just reinvoke the function that was passed to runInTransaction.

vkedia · 2017-03-21T17:45:06Z

@stephenplusplus As JJ suggested we need to retry the whole callback. Its similar to what we do in other client libraries. Do you have any specific questions around that?

vkedia · 2017-03-21T17:45:40Z

@stephenplusplus Also to begin with lets first fix the bug about transaction id not being passed around and then we can tackle retries.

bjwatson · 2017-03-21T18:40:15Z

FYI @lukesneeringer @jmuk

vkedia · 2017-03-22T17:00:01Z

@jgeewax @stephenplusplus
So I thought a bit more about this and here is my understanding.
There are two variants of database.runTransaction. One that takes in a callback and another that returns a promise.
Is it correct that the one that takes in a callback behaves properly both with respect to retries and with respect to transaction id being passed around?
Do we need to have both these methods since it is not clear how one would retry the variant that returns a Promise?

stephenplusplus · 2017-03-22T17:01:14Z

(cc @callmehiphop for these q's as he is the original implementer)

callmehiphop · 2017-03-22T17:23:11Z

I think there was a fumble on our end, but my intention was never for this method to behave how the docs describe promise usage. Ideally the user would still pass in a callback but the callback itself would return a promise, making the original method chainable

var promise = database.runTransaction(function(transaction) {
  return transaction.run('SELECT * FROM Singers').then(function() {
    // do stuff
    return transaction.commit();
  });
});

promise.then(function() {
  // all good!
}, function(err) {
  // oh noes!
});

We had originally talked about trying to intercept the promise chain and retrying all the callbacks up until commit is called (hence why I think the doc examples are written this way), but ultimately I decided against it due to that behavior being unpredictable.

vkedia · 2017-03-22T17:25:56Z

Ah I see. So how do we tackly this now? Do we change the method to always take in a callback?
Also what about the transaction id issue. That needs to be fixed irrespective?
@jgeewax Do you think making the change as @callmehiphop suggested works?

callmehiphop · 2017-03-22T17:30:46Z

@vkedia yeah, I think just ensuring the transaction ID is being passed in properly and updating the docs/promise handling of runTransaction would be the fix here.

vkedia · 2017-03-22T23:04:55Z

That seems reasonable to me.

vkedia · 2017-03-24T16:55:37Z

@callmehiphop @danoscarmike Let us proceed with this for now. I don't like having a client out there which can possible corrupt user data for too long.

Let us do the following:

Fix the issue with transaction id
Fix the method that returns a Promise to also take a callback.
Verify that the other method does retries correctly and if not fix that as well
Write a test to verify this behavior

vkedia · 2017-03-27T18:27:41Z

@callmehiphop @danoscarmike Any updates on this?

vkedia · 2017-03-27T18:31:39Z

/cc @lukesneeringer

lukesneeringer · 2017-03-27T21:27:29Z

I spoke with @callmehiphop elsewhere and he said he will have this done by the end of the day.

callmehiphop · 2017-03-28T21:28:35Z

@vkedia @stephenplusplus @lukesneeringer @jgeewax so our typical callback vs. promise convention is pretty simple - omit the callback and you shall receive a promise. I think this is the first API that really complicates this approach.

Consider the following example

database.runTransaction()
  .then(transaction => {
    return Promise.all([
      transaction,
      transaction.getRows('Singers')
    ]);
  })
  .then([transaction, rows] => {
    // do some kind of mutation based on the rows
    return transaction.commit();
  })
  .then(() => {
    // do something async/unrelated to the transaction
  });

My main issue is that in order to properly retry our transaction here, we would need to capture all functions passed to the promise before transaction.commit() and retry them in the correct order before allowing any chaining that occurs after to continue. Technically I don't think this is even feasible, but even if it were I think the overall behavior itself is totally unpredictable for our users. Based on this opinion, I think that our traditional approach will not fly here.

That being said, if we were to continue to try and let both callbacks and promises live together within a single method, we need a different way of letting the user tell us that they want promises.

My current suggestion is to let the user return a promise within the callback (like so)

database.runTransaction((err, transaction) => {
  if (err) return Promise.reject(err);

  // do some transactional based stuff
  return transaction.commit();
})
.then(() => {
  // transaction is committed, so some other stuff now
});

However after some discussion in my PR, this also may be confusing for the users. So I would like to revisit this specific API since we don't have a precedent for this specific issue.

An idea that we've been kicking around would basically be to not directly support promises in and out of this method, but only in.. This means the code would resemble the following

database.runTransaction((err, transaction) => {
  if (err) {
    // an error occurred while trying to get/create a session + transaction
  }

  // do reads/writes/etc. here
  
  // now the user has a choice between using a promise or a callback
  // and while this function will be ran multiple times in the event of an aborted error
  // commit will not resolve until either a success or final error occurs
  transaction.commit()
    .then(() => console.log('committed successfully!'))
    .catch((err) => console.error(err));
});

At which point if a user doesn't feel that this is the way they want to use Promises, it would be fairly easy to wrap that into something like

function runTransactions() {
  return new Promise((resolve, reject) => {
    database.runTransaction((err, transaction) => {
      if (err) return reject(err);
      // do transaction stuff
      transaction.commit().then(resolve, reject);
    });
  });
}

stephenplusplus · 2017-03-28T21:33:54Z

A quick "A or B" is:

// A)
datastore
  .runTransaction(function(err, transaction) {
    if (err) {
      // Could not start transaction 
      return Promise.reject(err)
    }

    // ...transaction stuff...
    return transaction.commit()
  })
  .then(success)
  .then(doSomethingElse)
  .catch(handleErrors)

// B)
datastore.runTransaction(function(err, tranaction) {
  if (err) {
    // Could not start transaction  
  }
  
  // ...transaction stuff...
  transaction.commit()
    .then(success)
    .then(doSomethingElse)
    .catch(handleErrors)
})

Gist: https://gist.github.com/stephenplusplus/3cb49529ee537c640a4670edf900e294

vkedia · 2017-03-28T21:41:07Z

I think option B is confusing. IIUC what you are saying is that even though user may end up calling transaction.commit multiple times, only one of the returned Promise would ever complete? Do you return the same Promise object each time transaction.commit is called?

callmehiphop · 2017-03-28T21:54:07Z

I believe what would happen is if we received an aborted error and had to restart the transaction, we would never resolve the commit() and then we would create a new Promise on the retry attempt.

vkedia · 2017-03-28T21:56:24Z

I am not sure its a good idea having all these unresolved Promises lying around. It is also very unintuitive for the user.

stephenplusplus · 2017-03-28T21:58:03Z

That's the case for both A and B though. We have to re-initialize and run the entire function, which can create an unknown amount of resources and memory.

vkedia · 2017-03-28T22:04:56Z

But that memory will get freed up at some point. But if I understand correctly in approach B, those Promises will linger around forever holding whatever memory they are.
Also the semantics seem clearer in A. What is your argument against A?

vkedia · 2017-03-28T22:06:39Z

@lukesneeringer
Another meta comment is that while we are debating how to implement the Promise api, can we just go ahead with the non promise API, fix the bug about transaction id, entirely remove the Promise api and release that. I am really uncomfortable with having this bug around. We can add the Promise api in subsequent release.

stephenplusplus · 2017-03-28T22:11:00Z

I did take it for granted that we would be able to free those up, though I'm not sure how-- Dave could hopefully know if a solution exists. If one doesn't, then that absolutely shuts down option B.

I think A is error-prone in two ways:

error handling: I have to handle an error inside of a callback, and inside of the catch block
returning a promise from a callback in order to activate the promise. If I forget, there is no promise

I think it's more clear to have the commit's resolution handled after the commit request, as opposed to using another chain of promises.

vkedia · 2017-03-28T22:14:19Z

As far as error handling concerned, why not omit the error handling in the callback. Even for errors like session not available, why not just bubble them up by rejecting the Promise returned by runTransaction.

stephenplusplus · 2017-03-28T22:15:15Z

I believe that's because we don't know if the user is using promises until they return one from the callback.

vkedia · 2017-03-28T22:17:29Z

How about using a different method for Promise rather than overloading runTransaction? Or have runTransaction take in two callables, one for handling transaction and another for handling error? If users want to use Promise then they should just pass the first callable and not the second one?

stephenplusplus · 2017-03-28T22:19:26Z

Assuming we worked out the memory control, can you elaborate on the disadvantages of option B?

vkedia · 2017-03-28T22:27:03Z

Actually thinking about this, the issue is not just with option B. Even in option A, transaction.commit() does return a Promise which the user is expected to return right back. What do we do with all those promises right now in case of retries. Do we resolve all of them?

stephenplusplus · 2017-03-29T12:56:02Z

Release @google-cloud/spanner@0.2.0, moved our discussion to #2152

https://github.com/GoogleCloudPlatform/google-cloud-node/releases/tag/spanner-0.2.0

jgeewax added api: spanner Issues related to the Spanner API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue. labels Mar 21, 2017

danoscarmike added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Mar 21, 2017

danoscarmike assigned callmehiphop Mar 21, 2017

danoscarmike added priority: p0 Highest priority. Critical issue. P0 implies highest priority. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Mar 21, 2017

callmehiphop mentioned this issue Mar 28, 2017

spanner: refactor transaction retry logic #2144

Closed

callmehiphop mentioned this issue Mar 28, 2017

spanner: add txn id to reads and remove promise support in runTransaction #2148

Merged

stephenplusplus closed this as completed in #2148 Mar 29, 2017

stephenplusplus mentioned this issue Mar 29, 2017

spanner: how should transactional promises look? #2152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spanner overlapping transactions don't fail when they should? #2118

Spanner overlapping transactions don't fail when they should? #2118

jgeewax commented Mar 21, 2017 •

edited

Loading

jgeewax commented Mar 21, 2017 •

edited

Loading

jgeewax commented Mar 21, 2017 •

edited

Loading

stephenplusplus commented Mar 21, 2017

vkedia commented Mar 21, 2017

vkedia commented Mar 21, 2017

vkedia commented Mar 21, 2017

bjwatson commented Mar 21, 2017

vkedia commented Mar 22, 2017

stephenplusplus commented Mar 22, 2017

callmehiphop commented Mar 22, 2017

vkedia commented Mar 22, 2017

callmehiphop commented Mar 22, 2017

vkedia commented Mar 22, 2017

vkedia commented Mar 24, 2017

vkedia commented Mar 27, 2017

vkedia commented Mar 27, 2017

lukesneeringer commented Mar 27, 2017

callmehiphop commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017 •

edited

Loading

vkedia commented Mar 28, 2017

callmehiphop commented Mar 28, 2017 •

edited

Loading

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 29, 2017

Spanner overlapping transactions don't fail when they should? #2118

Spanner overlapping transactions don't fail when they should? #2118

Comments

jgeewax commented Mar 21, 2017 • edited Loading

jgeewax commented Mar 21, 2017 • edited Loading

jgeewax commented Mar 21, 2017 • edited Loading

stephenplusplus commented Mar 21, 2017

vkedia commented Mar 21, 2017

vkedia commented Mar 21, 2017

vkedia commented Mar 21, 2017

bjwatson commented Mar 21, 2017

vkedia commented Mar 22, 2017

stephenplusplus commented Mar 22, 2017

callmehiphop commented Mar 22, 2017

vkedia commented Mar 22, 2017

callmehiphop commented Mar 22, 2017

vkedia commented Mar 22, 2017

vkedia commented Mar 24, 2017

vkedia commented Mar 27, 2017

vkedia commented Mar 27, 2017

lukesneeringer commented Mar 27, 2017

callmehiphop commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017 • edited Loading

vkedia commented Mar 28, 2017

callmehiphop commented Mar 28, 2017 • edited Loading

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 28, 2017

vkedia commented Mar 28, 2017

stephenplusplus commented Mar 29, 2017

jgeewax commented Mar 21, 2017 •

edited

Loading

jgeewax commented Mar 21, 2017 •

edited

Loading

jgeewax commented Mar 21, 2017 •

edited

Loading

stephenplusplus commented Mar 28, 2017 •

edited

Loading

callmehiphop commented Mar 28, 2017 •

edited

Loading