-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Support locking options in SELECT statement (SELECT FOR UPDATE etc.) #1199
Comments
A relevant blog post by Michael Paquier: https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-for-key-share-and-for-no-key-update/ The relevant PostgreSQL documentation page: https://www.postgresql.org/docs/11/explicit-locking.html#LOCKING-ROWS
|
Here is how some of these could be mapped to YugaByte DB locking / intent modes: FOR UPDATE: strong read + strong write lock on the DocKey, as if we're replacing or deleting the entire row in DocDB. Conflict matrix: FOR UPDATE vs FOR UPDATE = conflict (detected as strong read vs. strong write conflict) FOR NO KEY UPDATE vs. FOR NO KEY UPDATE = conflict (detected as strong read vs. weak write, because each operation pretends to be reading the entire row) FOR SHARE vs. FOR SHARE = OK (reads locks don't conflict) FOR KEY SHARE vs. FOR KEY SHARE = OK (weak read locks obviously don't conflict) So the above locking scheme matches the conflict matrix expected between PostgreSQL locking levels. Open question: should the above mapping be any different for our implementation of serializable vs. snapshot isolation? What else need to be implemented at snapshot isolation to support this? |
We'll need to modify Here is how we write the provisional records (the "read intents") on the read paths: if (serializable_isolation) {
WriteRequestPB write_req;
*write_req.mutable_write_batch()->mutable_transaction() = req->transaction();
write_req.set_tablet_id(req->tablet_id());
read_time.AddToPB(&write_req);
write_req.mutable_write_batch()->set_may_have_metadata(req->may_have_metadata());
// TODO(dtxn) write request id
auto* write_batch = write_req.mutable_write_batch();
auto status = leader_peer.peer->tablet()->CreateReadIntents(
req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch);
if (!status.ok()) {
SetupErrorAndRespond(
resp->mutable_error(), status, TabletServerErrorPB::UNKNOWN_ERROR, &context);
return;
}
auto operation_state = std::make_unique<WriteOperationState>(
leader_peer.peer->tablet(), &write_req, nullptr /* response */,
docdb::OperationKind::kRead);
auto context_ptr = std::make_shared<RpcContext>(std::move(context));
read_context.context = context_ptr.get();
operation_state->set_completion_callback(std::make_unique<ReadOperationCompletionCallback>(
this, leader_peer.peer, read_context, context_ptr));
leader_peer.peer->WriteAsync(
std::move(operation_state), leader_peer.leader_term, context_ptr->GetClientDeadline());
return;
} |
One additional thing that needs to happen for snapshot isolation + select for ... kind of locking (and not for serializable isolation) is checking for conflicts with transactions committed between the transaction start time and the current read time. E.g. if the row was deleted between the transaction start time and current read time, we should not be able to acquire the FOR KEY SHARE lock. We should be able to reuse some of the existing logic that we use to detect these kinds of conflicts in snapshot isolation. Look for this function in
|
This is the important part:
We need to make sure read_time_ is set correctly for writing provisional records on the read path in snapshot isolation, then I think the existing conflict-with-committed detection logic should work. |
In auto status = leader_peer.peer->tablet()->CreateReadIntents(
req->transaction(), req->ql_batch(), req->pgsql_batch(), write_batch); This is what CreateReadIntents does in Status Tablet::CreateReadIntents(
const TransactionMetadataPB& transaction_metadata,
const google::protobuf::RepeatedPtrField<QLReadRequestPB>& ql_batch,
const google::protobuf::RepeatedPtrField<PgsqlReadRequestPB>& pgsql_batch,
docdb::KeyValueWriteBatchPB* write_batch) {
auto txn_op_ctx = VERIFY_RESULT(CreateTransactionOperationContext(transaction_metadata));
for (const auto& ql_read : ql_batch) {
docdb::QLReadOperation doc_op(ql_read, txn_op_ctx);
RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
}
for (const auto& pgsql_read : pgsql_batch) {
docdb::PgsqlReadOperation doc_op(pgsql_read, txn_op_ctx);
RETURN_NOT_OK(doc_op.GetIntents(SchemaRef(), write_batch));
}
return Status::OK();
} and it ends up calling PgsqlReadOperation::GetIntents: Status PgsqlReadOperation::GetIntents(const Schema& schema, KeyValueWriteBatchPB* out) {
auto pair = out->mutable_read_pairs()->Add();
if (request_.partition_column_values().empty()) {
// Empty components mean that we don't have primary key at all, but request
// could still contain hash_code as part of tablet routing.
// So we should ignore it.
pair->set_key(std::string(1, ValueTypeAsChar::kGroupEnd));
} else {
std::vector<PrimitiveValue> hashed_components;
RETURN_NOT_OK(InitKeyColumnPrimitiveValues(
request_.partition_column_values(), schema, 0 /* start_idx */, &hashed_components));
DocKey doc_key(request_.hash_code(), hashed_components);
pair->set_key(doc_key.Encode().data());
}
pair->set_value(std::string(1, ValueTypeAsChar::kNull));
return Status::OK();
} The PgsqlReadOperation::GetIntents function generates top-level read intents (currently only used for serializable isolation), and then they get turned into an entire sequence of weak intents on ancestors + strong intent on the key itself later (will point out where exactly that happens in a further comment). |
The entry point to resolving operation conflicts is in the call from Tablet::StartDocWriteOperation to docdb::ResolveOperationConflicts. |
Here's where WriteOperationState gets wrapped into a WriteOperation: auto operation = std::make_unique<WriteOperation>(std::move(state), term, deadline, this); (in tablet_peer.cc, TabletPeer::WriteAsync). |
We decided that we will focus on implementing the following two lock types, but initially it would be identical to what serializable isolation read intents do today, so both of the following intent types would be the same for now: FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB We will relax the locking mode in FOR KEY SHARE later. |
Summary: Added evolutions file and changed var types to be compatible with ysql Added workaround to #1199 to `yugabyte-db` Test Plan: `yugabyte-db start --enable_ui` Reviewers: bogdan, ram Reviewed By: ram Subscribers: yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D6976
Is there a plan to also support the "NOWAIT" and "SKIP LOCKED" options for "SELECT ... FOR UPDATE" that PostgreSQL supports? |
Any deadline for supporting this feature? |
It looks like lots of migrations runners (http://knexjs.org/, for example) use |
The same with liquibase and hibernate. That's the part I'm really stuck on. |
…SELECT statements Summary: Support row-level explicit locking in `SELECT` statements for YSQL tables. The new supported cases are: - `SELECT .. FOR SHARE` - `SELECT .. FOR KEY SHARE` Additionally, since the `FOR KEY SHARE` case is also used internally for referential integrity checking (i.e. foreign keys), this diff also enables DMLs on tables with foreign key references regardless of isolation level (previously only worked in `SERIALIZABLE` isolation). Implementation is done by passing the row mark as flag in the DocDB read request and taking appropriate lock for those requests that have that corresponding row mark. Limitations (that will be addressed in follow-up commits): - The `FOR UPDATE` and `FOR NO KEY UPDATE` are not yet supported - The locking for `FOR KEY SHARE` is stronger than strictly required (strong read lock instead of weak read lock). This is still correct but less efficient (will conflict more often than required). - We don't support having different row mark settings within one DocDB (read) batch. This is fine for now because we do not batch such requests together, but later we might. The current implementation makes data visible between two transactions when it shouldn't. This is because we are currently not detecting read conflicts. #2523 tracks this issue. Test Plan: Java test testForeignKeyConflicts has been converted to two tests: testForeignKeyConflictsWithSerializableIsolation and testForeignKeyConflictsWithSnapshotIsolation/ ```postgres=# create table t(k int primary key, s text); CREATE TABLE postgres=# select * from t where k = 1 for key share; k | s ---+--- (0 rows) postgres=# insert into t(k, s) values (1, 'a'); INSERT 0 1 postgres=# select * from t where k = 1 for key share; k | s ---+--- 1 | a (1 row) postgres=# select * from t for key share; k | s ---+--- 1 | a (1 row) ``` ```postgres=# create table t(k int primary key, s text); CREATE TABLE postgres=# insert into t(k, s) values (1, 'a'); INSERT 0 1 postgres=# begin; BEGIN postgres=# insert into t(k,s) values (2, 'b'); INSERT 0 1 postgres=# select * from t for key share; k | s ---+--- 1 | a 2 | b (2 rows) postgres=# select * from t where k = 1 for key share; k | s ---+--- 1 | a (1 row) postgres=# abort; ROLLBACK postgres=# select * from t where k = 1 for key share; k | s ---+--- 1 | a (1 row) postgres=# select * from t for key share; k | s ---+--- 1 | a (1 row) ``` ``` postgres=# create table t(id int primary key, s text); CREATE TABLE postgres=# create table s(id int primary key , t_id int references t(id)); CREATE TABLE postgres=# insert into t(id, s) values (1, 'a'); INSERT 0 1 postgres=# insert into t(id, s) values (2, 'b'); INSERT 0 1 postgres=# insert into s(id, t_id) values (100, 1); INSERT 0 1 postgres=# select s.id, t.s from s, t where s.t_id = t.id; id | s -----+--- 100 | a (1 row) postgres=# begin; BEGIN postgres=# delete from t where id = 2; DELETE 1 postgres=# select * from t; id | s ----+--- 1 | a (1 row) postgres=# commit; COMMIT postgres=# select * from t; id | s ----+--- 1 | a (1 row) ``` Reviewers: mihnea, mikhail Reviewed By: mikhail Subscribers: bogdan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D7007
@ben-pr-p especially knex was refactored by cockroachdb developer to avoid Thanks to @hectorgcr and @jj-kim for progress on this issue. This is the first issue I ran with keycloak (and liquibase):
|
@jj-kim Thank you for the update. Eagerly waiting for this fix. 😊 |
Hi @jj-kim , I added #2742 for "SKIP LOCKED". The main use case for "SKIP LOCKED" is implementing work queues within the database. Another use case is porting existing applications relying on that feature to use YB. |
Here is a progress update on the issue.
Here is some documentation on intents in case anyone is interested: https://docs.yugabyte.com/latest/architecture/transactions/distributed-txns/. |
Here is another progress update on the issue.
Here's what's left:
|
Here's another update.
|
Summary: Enable `FOR UPDATE` and `FOR NO KEY UPDATE` row locking statements with the caveat that `FOR NO KEY UPDATE` behaves like `FOR UPDATE` (see follow-up issue #2922). Also, fix conflict detection such that `FOR SHARE` locks do not conflict with each other. Implement the fix as follows: 1. Pass down `FOR UPDATE` and `FOR NO KEY UPDATE` row locks through Protobufs. 1. Pass them through `PgsqlReadRequestPB.row_mark_type`. (Also, change the field from `repeated` to `optional`.) 1. Pass them through `KeyValueWriteBatchPB.row_mark_type`. (Also, change the field from `repeated` to `optional`.) 1. Add a row lock parameter to `docdb::GetStrongIntentTypeSet`, and permeate row lock information throughout the affected areas (fix issue #2842). Remove the isolation level hack in `tablet::Tablet::PrepareTransactionWriteBatch` in favor of using row lock information (fix issue #2496). 1. Create a new value type `kRowLock` to be placed in the value of intents for row locking. Handle this value type in `docdb::(anonymous namespace)::DecodeStrongWriteIntent` (for in-transaction reads) and `docdb::IntentToWriteRequest` (for transaction commits). 1. Create tests, specified in the test plan. Also, do the following: * Create new files defining helper functions related to row locks (row marks). * `src/yb/common/row_mark.cc` * `src/yb/common/row_mark.h` * Prevent `ROW_MARK_REFERENCE` and `ROW_MARK_COPY` from getting passed down as the `rowmark` execution parameter. * Update regress test `yb_pg_privileges` (java test `TestPgRegressAuthorization`) to uncomment `FOR UPDATE` row locking statements. Test Plan: * Jenkins * `./yb_build.sh` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SerializableDeleteForNoKeyUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SerializableDeleteForUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SerializableInsertForNoKeyUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SerializableInsertForUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SnapshotDeleteForNoKeyUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SnapshotDeleteForUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SnapshotInsertForNoKeyUpdate` * `--cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SnapshotInsertForUpdate` Reviewers: hector, sergei, mikhail Reviewed By: mikhail Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D7453
Thanks for the patience, everyone. This is now landed. Here are some followup issues: |
Support
FOR UPDATE
,FOR NO KEY UPDATE
,FOR SHARE
orFOR KEY SHARE
locking options inSELECT
statement.This also blocks
FOREIGN KEY
constraints in snapshot isolation -- i.e. non-serializable.The text was updated successfully, but these errors were encountered: