Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] List of aborted sub-txns is removed by asynchronous heartbeats #13222

Closed
pkj415 opened this issue Jul 8, 2022 · 0 comments
Closed

[YSQL] List of aborted sub-txns is removed by asynchronous heartbeats #13222

pkj415 opened this issue Jul 8, 2022 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pkj415
Copy link
Contributor

pkj415 commented Jul 8, 2022

Jira Link: DB-2886

Description

85ac8d8 ensures that YSQL synchronously updates the list of aborted sub-txns on the transaction status record of a status tablet when some sub-txn is rolled back.

But the commit introduces the below bug in PendingReplicationFinished() that causes the aborted sub-txns list to disappear in the next asynchronous heartbeat from YSQL to the status tablet:
aborted_ = data.state.aborted();

Asynchronous heartbeats send an empty aborted sub-txn list as of that commit. And due to the above bug, the empty list replaces the aborted sub-txn list on the status record in the next heartbeat.

@pkj415 pkj415 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 8, 2022
@pkj415 pkj415 self-assigned this Jul 8, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 8, 2022
@pkj415 pkj415 removed kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage labels Jul 8, 2022
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Jul 22, 2022
…t spuriously removed by asynchronous heartbeats

Summary:
4fb1676 added logic that enabled transactions to
ignore conflicting intents of aborted sub-txns. This was done by asynchronously
sending the aborted sub-txn list to the transactions coordinator using the
existing heartbeats. The aborted sub-txns list is then used by transaction
participants during conflicting resolution.

85ac8d8, which came later, ensured that YSQL
synchronously updates the list of aborted sub-txns on the transaction status
record of a status tablet when some sub-txn is rolled back. This was to
eliminate any window where a transaction might face a conflict error when another
transaction's aborted intents.

But the second commit introduces the following buggy line in
PendingReplicationFinished() that causes the aborted sub-txns list to disappear
in the next asynchronous heartbeat from YSQL to the status tablet:
`aborted_ = data.state.aborted();`.
The second commit removed the aborted sub-txns list from the asynchronous
heartbeats. And due to the above buggy line, the empty list replaces the
aborted sub-txn list on the status record in the next heartbeat.

The implication of this is: other transactions are not able to ignore conflicting
intents that are part of aborted sub-txns.

The two tests introduced in 4fb1676 are moved
to the isolation_regress test suite:

(1) testIgnoresIntentOfRolledBackSavepointCrossTxn
(2) testIgnoresLockOnlyConflictOfCommittedTxn

The bug introduced by 85ac8d8 is tested now
by an extra case in testIgnoresIntentOfRolledBackSavepointCrossTxn: a sleep
is introduced before the second transaction attempts to insert the same row.
Without the fix, this case fails since the sleep is enough for an asynchronous
heartbeat to clear the aborted sub-txn list and cause the second transation
to face a conflict error.

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegress
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApply

Reviewers: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D18226
pkj415 added a commit that referenced this issue Jul 26, 2022
…usly removed by asynchronous heartbeats

Summary:
4fb1676 added logic that enabled transactions to
ignore conflicting intents of aborted sub-txns. This was done by asynchronously
sending the aborted sub-txn list to the transactions coordinator using the
existing heartbeats. The aborted sub-txns list is then used by transaction
participants during conflicting resolution.

85ac8d8, which came later, ensured that YSQL
synchronously updates the list of aborted sub-txns on the transaction status
record of a status tablet when some sub-txn is rolled back. This was to
eliminate any window where a transaction might face a conflict error when another
transaction's aborted intents.

But the second commit introduces the following buggy line in
PendingReplicationFinished() that causes the aborted sub-txns list to disappear
in the next asynchronous heartbeat from YSQL to the status tablet:
`aborted_ = data.state.aborted();`.
The second commit removed the aborted sub-txns list from the asynchronous
heartbeats. And due to the above buggy line, the empty list replaces the
aborted sub-txn list on the status record in the next heartbeat.

The implication of this is: other transactions are not able to ignore conflicting
intents that are part of aborted sub-txns.

The fix is to update `aborted_` only when `data.state.aborted()` is not empty.
This will ensure that asynchronous heartbeats don't update the `aborted_`
state variable.

The two tests introduced in 4fb1676
are moved to the isolation_regress test suite:

(1) testIgnoresIntentOfRolledBackSavepointCrossTxn
(2) testIgnoresLockOnlyConflictOfCommittedTxn

The bug introduced by 85ac8d8 is tested now
by an extra case in testIgnoresIntentOfRolledBackSavepointCrossTxn: a sleep
is introduced before the second transaction attempts to insert the same row.
Without the fix, this case fails since the sleep is enough for an asynchronous
heartbeat to clear the aborted sub-txn list and cause the second transation
to face a conflict error.

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegress
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApply

Reviewers: rsami

Reviewed By: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D18226
pkj415 added a commit that referenced this issue Jul 28, 2022
…s is not spuriously removed by asynchronous heartbeats

Summary:
4fb1676 added logic that enabled transactions to
ignore conflicting intents of aborted sub-txns. This was done by asynchronously
sending the aborted sub-txn list to the transactions coordinator using the
existing heartbeats. The aborted sub-txns list is then used by transaction
participants during conflicting resolution.

85ac8d8, which came later, ensured that YSQL
synchronously updates the list of aborted sub-txns on the transaction status
record of a status tablet when some sub-txn is rolled back. This was to
eliminate any window where a transaction might face a conflict error when another
transaction's aborted intents.

But the second commit introduces the following buggy line in
PendingReplicationFinished() that causes the aborted sub-txns list to disappear
in the next asynchronous heartbeat from YSQL to the status tablet:
`aborted_ = data.state.aborted();`.
The second commit removed the aborted sub-txns list from the asynchronous
heartbeats. And due to the above buggy line, the empty list replaces the
aborted sub-txn list on the status record in the next heartbeat.

The implication of this is: other transactions are not able to ignore conflicting
intents that are part of aborted sub-txns.

The fix is to update `aborted_` only when `data.state.aborted()` is not empty.
This will ensure that asynchronous heartbeats don't update the `aborted_`
state variable.

The two tests introduced in 4fb1676
are moved to the isolation_regress test suite:

(1) testIgnoresIntentOfRolledBackSavepointCrossTxn
(2) testIgnoresLockOnlyConflictOfCommittedTxn

The bug introduced by 85ac8d8 is tested now
by an extra case in testIgnoresIntentOfRolledBackSavepointCrossTxn: a sleep
is introduced before the second transaction attempts to insert the same row.
Without the fix, this case fails since the sleep is enough for an asynchronous
heartbeat to clear the aborted sub-txn list and cause the second transation
to face a conflict error.

Original commit: 82c301e / D18226

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegress
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApply

Reviewers: rsami

Reviewed By: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D18531
@pkj415 pkj415 closed this as completed Jul 28, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants