Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support locking options in SELECT statement (SELECT FOR UPDATE etc.) #1199

Closed
ndeodhar opened this issue Apr 11, 2019 · 21 comments
Closed
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) priority/high High Priority
Milestone

Comments

@ndeodhar
Copy link
Contributor

ndeodhar commented Apr 11, 2019

Support FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE locking options in SELECT statement.

This also blocks FOREIGN KEY constraints in snapshot isolation -- i.e. non-serializable.

@mbautin
Copy link
Contributor

mbautin commented Jun 11, 2019

A relevant blog post by Michael Paquier: https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-for-key-share-and-for-no-key-update/

The relevant PostgreSQL documentation page: https://www.postgresql.org/docs/11/explicit-locking.html#LOCKING-ROWS

In addition to table-level locks, there are row-level locks, which are listed as below with the contexts in which they are used automatically by PostgreSQL. See Table 13.3 for a complete table of row-level lock conflicts. Note that a transaction can hold conflicting locks on the same row, even in different subtransactions; but other than that, two transactions can never hold conflicting locks on the same row. Row-level locks do not affect data querying; they block only writers and lockers to the same row.

Row-level Lock Modes

FOR UPDATE
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.

The FOR UPDATE lock mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.

FOR NO KEY UPDATE
Behaves similarly to FOR UPDATE, except that the lock acquired is weaker: this lock will not block SELECT FOR KEY SHARE commands that attempt to acquire a lock on the same rows. This lock mode is also acquired by any UPDATE that does not acquire a FOR UPDATE lock.

FOR SHARE
Behaves similarly to FOR NO KEY UPDATE, except that it acquires a shared lock rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE or SELECT FOR KEY SHARE.

FOR KEY SHARE
Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared lock blocks other transactions from performing DELETE or any UPDATE that changes the key values, but not other UPDATE, and neither does it prevent SELECT FOR NO KEY UPDATE, SELECT FOR SHARE, or SELECT FOR KEY SHARE.

PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.

@mbautin
Copy link
Contributor

mbautin commented Jun 11, 2019

Here is how some of these could be mapped to YugaByte DB locking / intent modes:

FOR UPDATE: strong read + strong write lock on the DocKey, as if we're replacing or deleting the entire row in DocDB.
FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading the entire row and then writing only a subset of columns in DocDB
FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB
FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being replaced / deleted, as if we're simply reading some of the column. This is the type of locking that is used by foreign keys, so this will prevent the referenced row from disappearing. The reason it does not conflict with the FOR NO KEY UPDATE above is conceptually the following: an operation that reads the entire row and then writes a subset of columns (FOR NO KEY UPDATE) does not have to conflict with an operation that could be reading a different subset of columns (FOR KEY SHARE).

Conflict matrix:

FOR UPDATE vs FOR UPDATE = conflict (detected as strong read vs. strong write conflict)
FOR UPDATE vs FOR NO KEY UPDATE = conflict (detected as strong read vs. weak write conflict)
FOR UPDATE vs FOR SHARE = conflict (detected as strong write vs. strong read conflict)
FOR UPDATE vs FOR KEY SHARE = conflict (detected as strong write vs. weak read conflict)

FOR NO KEY UPDATE vs. FOR NO KEY UPDATE = conflict (detected as strong read vs. weak write, because each operation pretends to be reading the entire row)
FOR NO KEY UPDATE vs. FOR SHARE = conflict (detected as weak write vs. strong read, as the FOR SHARE operation pretends to be reading the entire row)
FOR NO KEY UPDATE vs. FOR KEY SHARE = OK (no conflict -- weak locks don't conflict with each other, and strong read does not conflict with weak read)

FOR SHARE vs. FOR SHARE = OK (reads locks don't conflict)
FOR SHARE vs . FOR KEY SHARE = OK (reads locks don't conflict)

FOR KEY SHARE vs. FOR KEY SHARE = OK (weak read locks obviously don't conflict)

So the above locking scheme matches the conflict matrix expected between PostgreSQL locking levels.

Open question: should the above mapping be any different for our implementation of serializable vs. snapshot isolation? What else need to be implemented at snapshot isolation to support this?

@m-iancu m-iancu added this to the v2.0 milestone Jun 13, 2019
@mbautin mbautin changed the title [YSQL] Support locking options in SELECT statement [YSQL] Support locking options in SELECT statement (SELECT FOR UPDATE etc.) Jun 24, 2019
@mbautin
Copy link
Contributor

mbautin commented Jun 24, 2019

We'll need to modify TabletServiceImpl::Read to turn on the provisional record writing on reads in this case as well. Currently it is only turned on for serializable isolation.

Here is how we write the provisional records (the "read intents") on the read paths:

  if (serializable_isolation) {
    WriteRequestPB write_req;
    *write_req.mutable_write_batch()->mutable_transaction() = req->transaction();
    write_req.set_tablet_id(req->tablet_id());
    read_time.AddToPB(&write_req);
    write_req.mutable_write_batch()->set_may_have_metadata(req->may_have_metadata());
    // TODO(dtxn) write request id

    auto* write_batch = write_req.mutable_write_batch();
    auto status = leader_peer.peer->tablet()->CreateReadIntents(
        req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch);
    if (!status.ok()) {
      SetupErrorAndRespond(
          resp->mutable_error(), status, TabletServerErrorPB::UNKNOWN_ERROR, &context);
      return;
    }

    auto operation_state = std::make_unique<WriteOperationState>(
        leader_peer.peer->tablet(), &write_req, nullptr /* response */,
        docdb::OperationKind::kRead);

    auto context_ptr = std::make_shared<RpcContext>(std::move(context));
    read_context.context = context_ptr.get();
    operation_state->set_completion_callback(std::make_unique<ReadOperationCompletionCallback>(
        this, leader_peer.peer, read_context, context_ptr));
    leader_peer.peer->WriteAsync(
        std::move(operation_state), leader_peer.leader_term, context_ptr->GetClientDeadline());
    return;
  }

@mbautin
Copy link
Contributor

mbautin commented Jun 24, 2019

One additional thing that needs to happen for snapshot isolation + select for ... kind of locking (and not for serializable isolation) is checking for conflicts with transactions committed between the transaction start time and the current read time. E.g. if the row was deleted between the transaction start time and current read time, we should not be able to acquire the FOR KEY SHARE lock. We should be able to reuse some of the existing logic that we use to detect these kinds of conflicts in snapshot isolation.

Look for this function in conflict_resolution.cc:

  CHECKED_STATUS CheckConflictWithCommitted(
      const TransactionId& id, HybridTime commit_time) override {
    DSCHECK(commit_time.is_valid(), Corruption, "Invalid transaction commit time");

    VLOG(4) << "Committed: " << id << ", " << commit_time;

    // commit_time equals to HybridTime::kMax means that transaction is not actually committed,
    // but is being committed. I.e. status tablet is trying to replicate COMMITTED state.
    // So we should always conflict with such transaction, because we are not able to read its
    // results.
    //
    // read_time equals to HybridTime::kMax in case of serializable isolation or when
    // read time was not yet picked for snapshot isolation.
    // So it should conflict only with transactions that are being committed.
    //
    // In all other cases we have concrete read time and should conflict with transactions
    // that were committed after this point.
    if (commit_time >= read_time_) {
      return MakeConflictStatus(id, "committed", conflicts_metric_);
    }

    return Status::OK();
  }

@mbautin
Copy link
Contributor

mbautin commented Jun 24, 2019

This is the important part:

    // read_time equals to HybridTime::kMax in case of serializable isolation or when
    // read time was not yet picked for snapshot isolation.
    // So it should conflict only with transactions that are being committed.

We need to make sure read_time_ is set correctly for writing provisional records on the read path in snapshot isolation, then I think the existing conflict-with-committed detection logic should work.

@mbautin
Copy link
Contributor

mbautin commented Jun 25, 2019

In tablet_service.cc we call CreateReadIntents:

    auto status = leader_peer.peer->tablet()->CreateReadIntents(
        req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch);

This is what CreateReadIntents does in tablet.cc:

Status Tablet::CreateReadIntents(
    const TransactionMetadataPB& transaction_metadata,
    const google::protobuf::RepeatedPtrField<QLReadRequestPB>& ql_batch,
    const google::protobuf::RepeatedPtrField<PgsqlReadRequestPB>& pgsql_batch,
    docdb::KeyValueWriteBatchPB* write_batch) {
  auto txn_op_ctx = VERIFY_RESULT(CreateTransactionOperationContext(transaction_metadata));

  for (const auto& ql_read : ql_batch) {
    docdb::QLReadOperation doc_op(ql_read, txn_op_ctx);
    RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
  }

  for (const auto& pgsql_read : pgsql_batch) {
    docdb::PgsqlReadOperation doc_op(pgsql_read, txn_op_ctx);
    RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
  }

  return Status::OK();
}

and it ends up calling PgsqlReadOperation::GetIntents:

Status PgsqlReadOperation::GetIntents(const Schema& schema, KeyValueWriteBatchPB* out) {
  auto pair = out->mutable_read_pairs()->Add();

  if (request_.partition_column_values().empty()) {
    // Empty components mean that we don't have primary key at all, but request
    // could still contain hash_code as part of tablet routing.
    // So we should ignore it.
    pair->set_key(std::string(1, ValueTypeAsChar::kGroupEnd));
  } else {
    std::vector<PrimitiveValue> hashed_components;
    RETURN_NOT_OK(InitKeyColumnPrimitiveValues(
        request_.partition_column_values(), schema, 0 /* start_idx */, &hashed_components));

    DocKey doc_key(request_.hash_code(), hashed_components);
    pair->set_key(doc_key.Encode().data());
  }

  pair->set_value(std::string(1, ValueTypeAsChar::kNull));
  return Status::OK();
}

The PgsqlReadOperation::GetIntents function generates top-level read intents (currently only used for serializable isolation), and then they get turned into an entire sequence of weak intents on ancestors + strong intent on the key itself later (will point out where exactly that happens in a further comment).

@mbautin
Copy link
Contributor

mbautin commented Jun 25, 2019

The entry point to resolving operation conflicts is in the call from Tablet::StartDocWriteOperation to docdb::ResolveOperationConflicts.

@mbautin
Copy link
Contributor

mbautin commented Jun 25, 2019

Here's where WriteOperationState gets wrapped into a WriteOperation:

auto operation = std::make_unique<WriteOperation>(std::move(state), term, deadline, this);

(in tablet_peer.cc, TabletPeer::WriteAsync).

@mbautin
Copy link
Contributor

mbautin commented Jun 25, 2019

We decided that we will focus on implementing the following two lock types, but initially it would be identical to what serializable isolation read intents do today, so both of the following intent types would be the same for now:

FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB
FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being replaced / deleted, as if we're simply reading some of the column. This is the type of locking that is used by the foreign key logic in PostgreSQL when we add a row that references another table through a foreign key. A read is done on the appropriate key of the referenced table with FOR KEY SHARE, preventing that row from disappearing.

We will relax the locking mode in FOR KEY SHARE later.

WesleyW added a commit that referenced this issue Jul 31, 2019
Summary:
Added evolutions file and changed var types to be compatible with ysql
Added workaround to #1199 to `yugabyte-db`

Test Plan: `yugabyte-db start --enable_ui`

Reviewers: bogdan, ram

Reviewed By: ram

Subscribers: yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D6976
@lhotari
Copy link

lhotari commented Sep 16, 2019

Is there a plan to also support the "NOWAIT" and "SKIP LOCKED" options for "SELECT ... FOR UPDATE" that PostgreSQL supports?

@parthibd
Copy link

Any deadline for supporting this feature?

@kmuthukk kmuthukk added the priority/high High Priority label Sep 26, 2019
@ben-pr-p
Copy link

ben-pr-p commented Sep 27, 2019

It looks like lots of migrations runners (http://knexjs.org/, for example) use select * from migrations_table for update in order to prevent concurrent migrations from running, so this is a bigger barrier to Postgres compatibility than it might initially appear unfortunately

@parthibd
Copy link

The same with liquibase and hibernate. That's the part I'm really stuck on.

hectorgcr added a commit that referenced this issue Oct 16, 2019
…SELECT statements

Summary:
Support row-level explicit locking in `SELECT` statements for YSQL tables.
The new supported cases are:
- `SELECT .. FOR SHARE`
- `SELECT .. FOR KEY SHARE`

Additionally, since the `FOR KEY SHARE` case is also used internally for referential integrity
checking (i.e. foreign keys), this diff also enables DMLs on tables with foreign key references
regardless of isolation level (previously only worked in `SERIALIZABLE` isolation).

Implementation is done by passing the row mark as flag in the DocDB read request and taking
appropriate lock for those requests that have that corresponding row mark.

Limitations (that will be addressed in follow-up commits):
- The `FOR UPDATE` and `FOR NO KEY UPDATE` are not yet supported
- The locking for `FOR KEY SHARE` is stronger than strictly required (strong read lock instead of
   weak read lock). This is still correct but less efficient (will conflict more often than required).
- We don't support having different row mark settings within one DocDB (read) batch. This is fine
  for now because we do not batch such requests together, but later we might.

The current implementation makes data visible between two transactions when it shouldn't. This is because we are currently not detecting read conflicts. #2523 tracks this issue.

Test Plan:
Java test testForeignKeyConflicts has been converted to two tests: testForeignKeyConflictsWithSerializableIsolation and testForeignKeyConflictsWithSnapshotIsolation/

```postgres=# create table t(k int primary key, s text);
CREATE TABLE
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
(0 rows)

postgres=# insert into t(k, s) values (1, 'a');
INSERT 0 1
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# select * from t for key share;
 k | s
---+---
 1 | a
(1 row)
```

```postgres=# create table t(k int primary key, s text);
CREATE TABLE
postgres=#  insert into t(k, s) values (1, 'a');
INSERT 0 1
postgres=# begin;
BEGIN
postgres=# insert into t(k,s) values (2, 'b');
INSERT 0 1
postgres=# select * from t for key share;
 k | s
---+---
 1 | a
 2 | b
(2 rows)

postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# abort;
ROLLBACK
postgres=# select * from t where k = 1 for key share;
 k | s
---+---
 1 | a
(1 row)

postgres=# select * from t for key share;
 k | s
---+---
 1 | a
(1 row)
```

```
postgres=# create table t(id int primary key, s text);
CREATE TABLE
postgres=# create table s(id int primary key , t_id int references t(id));
CREATE TABLE
postgres=# insert into t(id, s) values (1, 'a');
INSERT 0 1
postgres=# insert into t(id, s) values (2, 'b');
INSERT 0 1
postgres=# insert into s(id, t_id) values (100, 1);
INSERT 0 1
postgres=# select s.id, t.s from s, t where s.t_id = t.id;
 id  | s
-----+---
 100 | a
(1 row)

postgres=# begin;
BEGIN
postgres=# delete from t where id = 2;
DELETE 1
postgres=# select * from t;
 id | s
----+---
  1 | a
(1 row)

postgres=# commit;
COMMIT
postgres=# select * from t;
 id | s
----+---
  1 | a
(1 row)
```

Reviewers: mihnea, mikhail

Reviewed By: mikhail

Subscribers: bogdan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7007
@ndeodhar ndeodhar assigned jaki and unassigned hectorgcr Oct 23, 2019
@jaki
Copy link
Contributor

jaki commented Oct 25, 2019

@lhotari, supporting NOWAIT and SKIP LOCKED options is low priority at the moment. You can file an issue for that, and if more people express interest, we'll raise its priority.

@parthibd, I am currently working on this. There is no hard deadline, but I suspect that it won't take more than a month.

@Bessonov
Copy link

@ben-pr-p especially knex was refactored by cockroachdb developer to avoid for update.

Thanks to @hectorgcr and @jj-kim for progress on this issue. This is the first issue I ran with keycloak (and liquibase):

14:24:31,406 WARN  [org.keycloak.connections.jpa.updater.liquibase.lock.CustomLockService] (ServerService Thread Pool -- 70) Lock didn't yet acquired. Will possibly retry to acquire lock. Details: ERROR: SELECT locking option only supported for temporary tables
  Hint: See https://github.com/YugaByte/yugabyte-db/issues/1199. Click '+' on the description to raise its priority [Failed SQL: SELECT ID FROM public.databasechangeloglock WHERE ID=1000 FOR UPDATE]

@parthibd
Copy link

@jj-kim Thank you for the update. Eagerly waiting for this fix. 😊

@lhotari
Copy link

lhotari commented Oct 28, 2019

@lhotari, supporting NOWAIT and SKIP LOCKED options is low priority at the moment. You can file an issue for that, and if more people express interest, we'll raise its priority.

Hi @jj-kim , I added #2742 for "SKIP LOCKED". The main use case for "SKIP LOCKED" is implementing work queues within the database. Another use case is porting existing applications relying on that feature to use YB.

@jaki
Copy link
Contributor

jaki commented Nov 8, 2019

Here is a progress update on the issue.

Here is some documentation on intents in case anyone is interested: https://docs.yugabyte.com/latest/architecture/transactions/distributed-txns/.

@jaki
Copy link
Contributor

jaki commented Nov 11, 2019

Here is another progress update on the issue.

  • Read times are still good.
  • Conflict detection seems to work under the current simplified interpretation of NO KEY UPDATE = UPDATE and KEY SHARE = SHARE.
  • The design has been discussed regarding the issue with strong write intents in serializable isolation transactions, and it has been implemented such that these issues now seem resolved.

Here's what's left:

  • Adding more tests, especially for conflict detection.
  • Making NO KEY UPDATE and KEY SHARE weaker (a separate issue may be filed for this).

@jaki
Copy link
Contributor

jaki commented Nov 14, 2019

Here's another update.

jaki pushed a commit that referenced this issue Nov 20, 2019
Summary:
Enable `FOR UPDATE` and `FOR NO KEY UPDATE` row locking statements with
the caveat that `FOR NO KEY UPDATE` behaves like `FOR UPDATE` (see
follow-up issue #2922).  Also, fix conflict detection such that `FOR
SHARE` locks do not conflict with each other.

Implement the fix as follows:

1. Pass down `FOR UPDATE` and `FOR NO KEY UPDATE` row locks through
   Protobufs.

   1. Pass them through `PgsqlReadRequestPB.row_mark_type`.  (Also,
      change the field from `repeated` to `optional`.)
   1. Pass them through `KeyValueWriteBatchPB.row_mark_type`.  (Also,
      change the field from `repeated` to `optional`.)

1. Add a row lock parameter to `docdb::GetStrongIntentTypeSet`, and
   permeate row lock information throughout the affected areas (fix
   issue #2842).  Remove the isolation level hack in
   `tablet::Tablet::PrepareTransactionWriteBatch` in favor of using row
   lock information (fix issue #2496).
1. Create a new value type `kRowLock` to be placed in the value of
   intents for row locking.  Handle this value type in
   `docdb::(anonymous namespace)::DecodeStrongWriteIntent` (for
   in-transaction reads) and `docdb::IntentToWriteRequest` (for
   transaction commits).
1. Create tests, specified in the test plan.

Also, do the following:

* Create new files defining helper functions related to row locks (row
  marks).

  * `src/yb/common/row_mark.cc`
  * `src/yb/common/row_mark.h`

* Prevent `ROW_MARK_REFERENCE` and `ROW_MARK_COPY` from getting passed
  down as the `rowmark` execution parameter.
* Update regress test `yb_pg_privileges` (java test
  `TestPgRegressAuthorization`) to uncomment `FOR UPDATE` row locking
  statements.

Test Plan:
* Jenkins
* `./yb_build.sh`

  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.RowLockConflictMatrix`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SerializableDeleteForNoKeyUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SerializableDeleteForUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SerializableInsertForNoKeyUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SerializableInsertForUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SnapshotDeleteForNoKeyUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SnapshotDeleteForUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SnapshotInsertForNoKeyUpdate`
  * `--cxx-test pgwrapper_pg_mini-test --gtest_filter
    PgMiniTest.SnapshotInsertForUpdate`

Reviewers: hector, sergei, mikhail

Reviewed By: mikhail

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7453
@jaki
Copy link
Contributor

jaki commented Nov 20, 2019

Thanks for the patience, everyone. This is now landed. Here are some followup issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) priority/high High Priority
Projects
None yet
Development

No branches or pull requests

10 participants