Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanner overlapping transactions don't fail when they should? #2118

Closed
jgeewax opened this issue Mar 21, 2017 · 32 comments
Closed

Spanner overlapping transactions don't fail when they should? #2118

jgeewax opened this issue Mar 21, 2017 · 32 comments
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue.

Comments

@jgeewax
Copy link
Contributor

jgeewax commented Mar 21, 2017

My understanding is that if you read a cell (say the name column from a specific row in an employees table) in two different transactions, then commit a change to that cell in one of them, then a subsequent commit to that same cell from the other transaction should fail. Is this correct?

(I'm pulling this info from our docs, and making sure that these are not 'blind writes' by doing a read first -- see https://cloud.google.com/spanner/docs/transactions#rw_transaction_performance)

If so, our client library doesn't appear to do that...?

var gcloud = require('google-cloud')({ projectId: 'jjg-cloud-research' });
var spanner = gcloud.spanner();
var instance = spanner.instance('test-instance');
var database = instance.database('test-database', {max: 5});
var table = database.table('employees');

var nameQuery = {keys: ['1'], columns: ['name']};

Promise.all([database.runTransaction(), database.runTransaction()]).then(
  function (txns) {
    var txn1 = txns[0][0], txn2 = txns[1][0];
    var txn = {1: txns[0][0], 2: txns[1][0]};

    function readNameFromTransaction(number) {
      return function () {
        return txn[number].read('employees', nameQuery).then(function (results) {
          console.log('txn' + number + ':', results[0][0]);
        });
      };
    }

    function changeNameFromTransaction(number) {
      return function () {
        txn[number].update('employees', {
          employee_id: '1',
          name: 'Updated by txn' + number
        });
        return txn[number].commit().then(function (results) {
          console.log('txn' + number + ':', results);
        });
      }
    }

    function readNameFromTable() {
      return table.read(nameQuery).then(function (results) {
        console.log('table:', results[0][0]);
      });
    }

    readNameFromTable()
      .then(readNameFromTransaction(1))
      .then(readNameFromTransaction(2))
      .then(changeNameFromTransaction(1))
      .then(readNameFromTable)
      .then(changeNameFromTransaction(2))
      .then(readNameFromTable)
      .catch(console.log);
  }
);

The output from this script looks totally successful:

table: [ { name: 'name', value: 'Updated by txn2' } ]
txn1: [ { name: 'name', value: 'Updated by txn2' } ]
txn2: [ { name: 'name', value: 'Updated by txn2' } ]
txn1: [ { commitTimestamp: { seconds: '1490106991', nanos: 173288000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { commitTimestamp: { seconds: '1490106991', nanos: 269254000 } } ]
table: [ { name: 'name', value: 'Updated by txn2' } ]

I expected that txn2 commit result to be a failure because the data changed out from under us... Is my understanding wrong? or the client library wrong? or a bug in my code?

/cc @callmehiphop

@jgeewax jgeewax added api: spanner Issues related to the Spanner API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue. labels Mar 21, 2017
@jgeewax
Copy link
Contributor Author

jgeewax commented Mar 21, 2017

OK. I've discovered that TransactionRequest.prototype.createReadStream requests do not include the transaction ID ... this is super duper bad :(

in transaction-request.js, starting line 215:

  if (self.id) {
    reqOpts.transaction = { id: self.id };
  }

  function makeRequest(resumeToken) {

Adding those 3 lines partly fixes the problem. I no longer have it blasting out writes (aka, Spanner can do the right thing), but I do not get an error in the way that I should. Things just crash... That could be my code though. New output below:

$ node test.js
table: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { commitTimestamp: { seconds: '1490109209', nanos: 785675000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
$ echo $?
0

@jgeewax
Copy link
Contributor Author

jgeewax commented Mar 21, 2017

OK -- after further digging, our retry logic appears to be broken. If I change the commit() method to force no retries (shouldRetry = 0), I get the following output:

table: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { name: 'name', value: 'Updated by txn1' } ]
txn2: [ { name: 'name', value: 'Updated by txn1' } ]
txn1: [ { commitTimestamp: { seconds: '1490110158', nanos: 928096000 } } ]
table: [ { name: 'name', value: 'Updated by txn1' } ]
{ Error: Transaction was aborted. It was wounded by a higher priority transaction due to conflict on key [1], column name in table employees.
    at /usr/local/google/home/jjg/node_modules/google-cloud/node_modules/grpc/src/node/src/client.js:434:17
  code: 10,
  metadata: 
   Metadata {
     _internal_repr: 
      { 'google.rpc.debuginfo-bin': [Object],
        'google.rpc.retryinfo-bin': [Object] } },
  note: 'Exception occurred in retry method that was not classified as transient' }
table: [ { name: 'name', value: 'Updated by txn1' } ]

Did the spec say that we should retry conflicts like this? I have a hard time buying that the right thing is to just re-run the same mutations we had queued up. For example, what if the change was adding $100 to a bank balance? Wouldn't we want to re-run all of the logic (get balance, add $100, save new balance) and not just re-run the one mutation (set balance = $200) ?

To clarify, I think if we want retries to work as expected, we need to have all of the code that would need to be retried in a chunk -- not just the reads/writes that were called on the mutation (txn.read, txn.insert, etc). This would lead to something more like...

database.runInTransaction(function (txn) {
  txn.read(query).then(function (results) {
    txn.insert(....);
    return txn.read(otherQuery);
  }).then(function () {
    txn.insert(...);
    return txn.commit();
  });
});

And on retry, we take that entire callback and run it again. With the example that I have above with promises... I have no idea how to pick out only the parts that involve txn1 and retry only those....

And we may want to try to stop people from nesting transactions or ensure that we don't retry nested transactions. If we retry something that has two transactions that conflict, like my example, not only will the second to commit always fail, but we may end with the first one committing several times (which, if it "doubles a bank balance") could be a very bad thing.

@danoscarmike danoscarmike added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Mar 21, 2017
@stephenplusplus
Copy link
Contributor

cc @vkedia for any thoughts or prior art on the correct way to handle this scenario.

@vkedia
Copy link
Contributor

vkedia commented Mar 21, 2017

@jgeewax Yes you are right that we should not just retry the buffered mutations. We need to retry all the operations that were done on the transactions. So essentially we should just reinvoke the function that was passed to runInTransaction.

@vkedia
Copy link
Contributor

vkedia commented Mar 21, 2017

@stephenplusplus As JJ suggested we need to retry the whole callback. Its similar to what we do in other client libraries. Do you have any specific questions around that?

@vkedia
Copy link
Contributor

vkedia commented Mar 21, 2017

@stephenplusplus Also to begin with lets first fix the bug about transaction id not being passed around and then we can tackle retries.

@danoscarmike danoscarmike added priority: p0 Highest priority. Critical issue. P0 implies highest priority. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Mar 21, 2017
@bjwatson
Copy link

FYI @lukesneeringer @jmuk

@vkedia
Copy link
Contributor

vkedia commented Mar 22, 2017

@jgeewax @stephenplusplus
So I thought a bit more about this and here is my understanding.
There are two variants of database.runTransaction. One that takes in a callback and another that returns a promise.
Is it correct that the one that takes in a callback behaves properly both with respect to retries and with respect to transaction id being passed around?
Do we need to have both these methods since it is not clear how one would retry the variant that returns a Promise?

@stephenplusplus
Copy link
Contributor

(cc @callmehiphop for these q's as he is the original implementer)

@callmehiphop
Copy link
Contributor

I think there was a fumble on our end, but my intention was never for this method to behave how the docs describe promise usage. Ideally the user would still pass in a callback but the callback itself would return a promise, making the original method chainable

var promise = database.runTransaction(function(transaction) {
  return transaction.run('SELECT * FROM Singers').then(function() {
    // do stuff
    return transaction.commit();
  });
});

promise.then(function() {
  // all good!
}, function(err) {
  // oh noes!
});

We had originally talked about trying to intercept the promise chain and retrying all the callbacks up until commit is called (hence why I think the doc examples are written this way), but ultimately I decided against it due to that behavior being unpredictable.

@vkedia
Copy link
Contributor

vkedia commented Mar 22, 2017

Ah I see. So how do we tackly this now? Do we change the method to always take in a callback?
Also what about the transaction id issue. That needs to be fixed irrespective?
@jgeewax Do you think making the change as @callmehiphop suggested works?

@callmehiphop
Copy link
Contributor

@vkedia yeah, I think just ensuring the transaction ID is being passed in properly and updating the docs/promise handling of runTransaction would be the fix here.

@vkedia
Copy link
Contributor

vkedia commented Mar 22, 2017

That seems reasonable to me.

@vkedia
Copy link
Contributor

vkedia commented Mar 24, 2017

@callmehiphop @danoscarmike Let us proceed with this for now. I don't like having a client out there which can possible corrupt user data for too long.

Let us do the following:

  • Fix the issue with transaction id
  • Fix the method that returns a Promise to also take a callback.
  • Verify that the other method does retries correctly and if not fix that as well
  • Write a test to verify this behavior

@vkedia
Copy link
Contributor

vkedia commented Mar 27, 2017

@callmehiphop @danoscarmike Any updates on this?

@vkedia
Copy link
Contributor

vkedia commented Mar 27, 2017

/cc @lukesneeringer

@lukesneeringer
Copy link
Contributor

I spoke with @callmehiphop elsewhere and he said he will have this done by the end of the day.

@callmehiphop
Copy link
Contributor

@vkedia @stephenplusplus @lukesneeringer @jgeewax so our typical callback vs. promise convention is pretty simple - omit the callback and you shall receive a promise. I think this is the first API that really complicates this approach.

Consider the following example

database.runTransaction()
  .then(transaction => {
    return Promise.all([
      transaction,
      transaction.getRows('Singers')
    ]);
  })
  .then([transaction, rows] => {
    // do some kind of mutation based on the rows
    return transaction.commit();
  })
  .then(() => {
    // do something async/unrelated to the transaction
  });

My main issue is that in order to properly retry our transaction here, we would need to capture all functions passed to the promise before transaction.commit() and retry them in the correct order before allowing any chaining that occurs after to continue. Technically I don't think this is even feasible, but even if it were I think the overall behavior itself is totally unpredictable for our users. Based on this opinion, I think that our traditional approach will not fly here.

That being said, if we were to continue to try and let both callbacks and promises live together within a single method, we need a different way of letting the user tell us that they want promises.

My current suggestion is to let the user return a promise within the callback (like so)

database.runTransaction((err, transaction) => {
  if (err) return Promise.reject(err);

  // do some transactional based stuff
  return transaction.commit();
})
.then(() => {
  // transaction is committed, so some other stuff now
});

However after some discussion in my PR, this also may be confusing for the users. So I would like to revisit this specific API since we don't have a precedent for this specific issue.

An idea that we've been kicking around would basically be to not directly support promises in and out of this method, but only in.. This means the code would resemble the following

database.runTransaction((err, transaction) => {
  if (err) {
    // an error occurred while trying to get/create a session + transaction
  }

  // do reads/writes/etc. here
  
  // now the user has a choice between using a promise or a callback
  // and while this function will be ran multiple times in the event of an aborted error
  // commit will not resolve until either a success or final error occurs
  transaction.commit()
    .then(() => console.log('committed successfully!'))
    .catch((err) => console.error(err));
});

At which point if a user doesn't feel that this is the way they want to use Promises, it would be fairly easy to wrap that into something like

function runTransactions() {
  return new Promise((resolve, reject) => {
    database.runTransaction((err, transaction) => {
      if (err) return reject(err);
      // do transaction stuff
      transaction.commit().then(resolve, reject);
    });
  });
}

@stephenplusplus
Copy link
Contributor

stephenplusplus commented Mar 28, 2017

A quick "A or B" is:

// A)
datastore
  .runTransaction(function(err, transaction) {
    if (err) {
      // Could not start transaction 
      return Promise.reject(err)
    }

    // ...transaction stuff...
    return transaction.commit()
  })
  .then(success)
  .then(doSomethingElse)
  .catch(handleErrors)

// B)
datastore.runTransaction(function(err, tranaction) {
  if (err) {
    // Could not start transaction  
  }
  
  // ...transaction stuff...
  transaction.commit()
    .then(success)
    .then(doSomethingElse)
    .catch(handleErrors)
})

Gist: https://gist.github.com/stephenplusplus/3cb49529ee537c640a4670edf900e294

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

I think option B is confusing. IIUC what you are saying is that even though user may end up calling transaction.commit multiple times, only one of the returned Promise would ever complete? Do you return the same Promise object each time transaction.commit is called?

@callmehiphop
Copy link
Contributor

callmehiphop commented Mar 28, 2017

I believe what would happen is if we received an aborted error and had to restart the transaction, we would never resolve the commit() and then we would create a new Promise on the retry attempt.

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

I am not sure its a good idea having all these unresolved Promises lying around. It is also very unintuitive for the user.

@stephenplusplus
Copy link
Contributor

That's the case for both A and B though. We have to re-initialize and run the entire function, which can create an unknown amount of resources and memory.

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

But that memory will get freed up at some point. But if I understand correctly in approach B, those Promises will linger around forever holding whatever memory they are.
Also the semantics seem clearer in A. What is your argument against A?

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

@lukesneeringer
Another meta comment is that while we are debating how to implement the Promise api, can we just go ahead with the non promise API, fix the bug about transaction id, entirely remove the Promise api and release that. I am really uncomfortable with having this bug around. We can add the Promise api in subsequent release.

@stephenplusplus
Copy link
Contributor

I did take it for granted that we would be able to free those up, though I'm not sure how-- Dave could hopefully know if a solution exists. If one doesn't, then that absolutely shuts down option B.

I think A is error-prone in two ways:

  • error handling: I have to handle an error inside of a callback, and inside of the catch block
  • returning a promise from a callback in order to activate the promise. If I forget, there is no promise

I think it's more clear to have the commit's resolution handled after the commit request, as opposed to using another chain of promises.

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

As far as error handling concerned, why not omit the error handling in the callback. Even for errors like session not available, why not just bubble them up by rejecting the Promise returned by runTransaction.

@stephenplusplus
Copy link
Contributor

I believe that's because we don't know if the user is using promises until they return one from the callback.

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

How about using a different method for Promise rather than overloading runTransaction? Or have runTransaction take in two callables, one for handling transaction and another for handling error? If users want to use Promise then they should just pass the first callable and not the second one?

@stephenplusplus
Copy link
Contributor

Assuming we worked out the memory control, can you elaborate on the disadvantages of option B?

@vkedia
Copy link
Contributor

vkedia commented Mar 28, 2017

Actually thinking about this, the issue is not just with option B. Even in option A, transaction.commit() does return a Promise which the user is expected to return right back. What do we do with all those promises right now in case of retries. Do we resolve all of them?

@stephenplusplus
Copy link
Contributor

Release @google-cloud/spanner@0.2.0, moved our discussion to #2152

https://github.com/GoogleCloudPlatform/google-cloud-node/releases/tag/spanner-0.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants