-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Explicit row-level locking should only lock what is returned to the client #9929
Labels
area/ysql
Yugabyte SQL (YSQL)
Comments
pkj415
added a commit
to pkj415/yugabyte-db
that referenced
this issue
Nov 30, 2021
…ing semantics + support SKIP LOCKED for REPEATABLE READ isolation level Summary: Part 1 (for first 2 issues): ---------------------------- Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes. A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query. Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ and SERIALIZABLE. This results in 2 issues for REPEATABLE READ isolation in YSQL: 1. Even data that is filtered out later by YSQL is locked. 2. New tuples also can't be inserted that fall in the locked prefix range. So, we are locking future non-existant tuples as well. For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to leaf nodes that scan docdb and hence lock larger chunks of the tablet. This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007) by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used. This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres. This has two benefits - 1. We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation that we plan to add support for in near future). 2. This makes implementation of other Pg locking features like SKIP LOCKED stragiht-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). Without this, such existing Pg features would require lot of code changes. Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the motivation to switch to only locking rows that are returned to user rather than coarser locks. NOTE: For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to block insertion of new rows matching the predicate (which the current prefix-locking achieves). Part 2: Support for SKIP LOCKED -------------------------------- After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). The implementation plumbs the wait policy for locking from postgres to tserver process which uses it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return a SkipLocking error. Following the wait_policy in YBCLockTuple() will make the implementation clear. Test Plan: 1. New test method TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation). 2. New test method TestSkipLocked: for SKIP LOCKED functionality. 3. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level. 4. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since yugabyte#2809 is fixed with this. Reviewers: mbautin, sergei, neil, jason Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13352
pkj415
added a commit
that referenced
this issue
Dec 3, 2021
…ort SKIP LOCKED for REPEATABLE READ isolation level Summary: **Part 1 (for first 2 issues):** Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes. A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query. Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ and SERIALIZABLE. **This results in 2 issues for REPEATABLE READ isolation in YSQL:** 1. Even data that is filtered out later by YSQL is locked. 2. New tuples also can't be inserted that fall in the locked prefix range. With a "prefix" intent, we are locking future non-existent tuples as well. For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to leaf nodes that scan docdb and hence lock larger chunks of the tablet. This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007) by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used. This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres. **This has two benefits -** i) We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation that we plan to add support for in near future). ii) For REPEATABLE READ/ READ COMMITTED, this makes implementation of other Pg locking features easier - a) SKIP LOCKED becomes straight-forward since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). Without this, such existing Pg features would require lot of code changes. b) Special handling of conflicts that use "EvalPlanQual" in Postgres will be readily available for YB use as we implement READ COMMITTED. Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the motivation to switch to only locking rows that are returned to user rather than coarser locks. To follow the flow of logic, start with //nodeLockRows.c// This diff is a result of conversation from this slack thread - https://yugabyte.slack.com/archives/CAR5BCH29/p1631036276226800 ``` NOTE: i) For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to block insertion of new rows matching the predicate (which the current prefix-locking achieves). ii) We can optimize the case of explicit locking on a SELECT with full pk specified by using just 1 RPC call. The will follow after this diff lands (#10429) ``` **Part 2: Support for SKIP LOCKED** After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). The implementation plumbs the wait policy for locking from postgres to tserver process which uses it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return a SkipLocking error. Following the wait_policy in YBCLockTuple() will make the implementation clear. Test Plan: **New tests:** 1. TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation). 2. TestSkipLocked: for SKIP LOCKED functionality. **Modified existing tests:** 1. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since #2809 is fixed with this. 2. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level. 3. TestPgExplicitLocks, TestPgForeignKeyOptimization and TestPgTransactions.testExplicitLocking Jenkins: urgent Reviewers: mbautin, neil, jason, smishra, mihnea, dmitry, sergei Reviewed By: dmitry, sergei Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13352
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
"When a locking clause appears at the top level of a SELECT query, the rows that are locked are exactly those that are returned by the query; in the case of a join query, the rows locked are those that contribute to returned join rows." - this is from Pg docs - https://www.postgresql.org/docs/12/sql-select.html
The below example shows this semantic in action. In YB, we lock more rows than what is returned to the user. Though this won't result in a correctness issue, to match Pg usability, we should lock only what is returned.
The cause of this is that even in REPEATABLE READ isolation level, we take coarser locks than just locks for specific rows. This should not be done in any isolation level except SERIALIZABLE (where coarser locks are required to ensure predicate locking to block new rows to be inserted that match the predicate of an ongoing txn's read).
NOTE: Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This will serve as the motivation to switch to only locking rows that are returned to user rather than coarser locks.
The text was updated successfully, but these errors were encountered: