Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Rollback DocDB operations on an unsuccessful ADD COLUMN/ALTER TABLE rewrite operation #22802

Closed
1 task done
fizaaluthra opened this issue Jun 10, 2024 · 0 comments
Closed
1 task done
Assignees
Labels
2.20 Backport Required 2024.1 Backport Required area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@fizaaluthra
Copy link
Member

fizaaluthra commented Jun 10, 2024

Jira Link: DB-11703

Description

On older branches (2.20 and beyond), when ADD COLUMN / ALTER TABLE rewrite operations fail, make an effort to rollback the DocDB changes.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@fizaaluthra fizaaluthra added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 10, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Jun 10, 2024
@sushantrmishra sushantrmishra removed the status/awaiting-triage Issue awaiting triage label Jun 11, 2024
@fizaaluthra fizaaluthra self-assigned this Jun 18, 2024
fizaaluthra added a commit that referenced this issue Jun 27, 2024
Summary:
As part of legacy table rewrite operations that clone the table, the old table is
renamed to <table_name>_temp_old. Without DDL atomicity, if the operation fails at
any point after the rename, the PG metadata is rolled back (relname is rolled back to
<table_name>), but the DocDB table name is still <table_name>_temp_old.
Note: the same rename is also done for the table's indexes.

These renamed DocDB tables cause issues with backup/restore as the restore
operation relies on table names.

We can avoid these issues by avoiding the DocDB level renames altogether,
as the old tables are going to be dropped anyway.

Now, after a failed ADD/DROP pkey, ALTER TYPE operation (using the legacy rewrite),
the DocDB table will no longer be left with the name <table_name>_temp_old.
However, there will now be two DocDB tables with the same table name after a failed
legacy rewrite operation. The async transaction cleanup task on master will lookup
the entry for this orphaned table in pg_class (using the pg oid in the DocDB table id),
and will drop the table upon finding that it doesn't exist. If, for some reason,
the orphaned table is not cleaned up, or if a back up is taken before the cleanup happens,
the backup code will simply ignore this orphaned table. Specifically, RepackSnapshotsForBackup
will not find the pg schema name metadata for this table, and will simply ignore it.
Therefore, there will be no ambiguity on restore side (which relies on the table names).
Jira: DB-11703

Test Plan: ./yb_build.sh --cxx-test yb-backup-cross-feature-test --gtest-filter YBBackupTest.TestBackupWithFailedLegacyRewrite

Reviewers: myang

Reviewed By: myang

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D35943
jasonyb pushed a commit that referenced this issue Jun 28, 2024
Summary:
 9c637e2 [PLAT-14429]: Modify Troubleshooting Platform registration workflow in YBA
 0a1406d [PLAT-14098]: Updating yb.runtime_conf_ui.tag_filter with appropriate tags value does not display the flags accordingly
 70a87f9 [PLAT-13605][PLAT-13609]: Edit Volume controls and storage type in FULL_MOVE but not in case of UPDATE
 26fbfe0 [PLAT-14515][UI] Clicking preview doesn't show the correct info and clears up the data provided while setting up the ysql_ident or ysql_hba multiline flags.- [PLAT-14514] [PLAT-14513]
 a07946b [#18233] Initial commit for yugabyted unit test framework.
 b2e8ee7 [#22842] docdb: Improve usability of stack trace tracking endpoints
 508f26e [docs] Added RN 2.20.2.3-b2 (#23042)
 214d44a [#22935] YSQL: Use db oid in the tserver's sequence cache entry key
 c47b2d9 [#22802] YSQL: Avoid renaming DocDb tables during legacy rewrite
 Excluded: 7c8343d [#22874] YSQL: Fix cascaded drops on columns
 58c8d4e [#23046] xCluster: Remove ns_replication from the code
 a70681d [#22816] YSQL: Bug fixes for replication connections in ysql connection manager
 b239e07 Doc upgrade versions (#22988)
 Excluded: ec76062 [#16408] YSQL: Split TestPgRegressIndex.testPgRegressIndex

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36247
fizaaluthra added a commit that referenced this issue Jul 1, 2024
…gacy rewrite

Summary:
As part of legacy table rewrite operations that clone the table, the old table is
renamed to <table_name>_temp_old. Without DDL atomicity, if the operation fails at
any point after the rename, the PG metadata is rolled back (relname is rolled back to
<table_name>), but the DocDB table name is still <table_name>_temp_old.
Note: the same rename is also done for the table's indexes.

These renamed DocDB tables cause issues with backup/restore as the restore
operation relies on table names.

We can avoid these issues by avoiding the DocDB level renames altogether,
as the old tables are going to be dropped anyway.

Now, after a failed ADD/DROP pkey, ALTER TYPE operation (using the legacy rewrite),
the DocDB table will no longer be left with the name <table_name>_temp_old.
However, there will now be two DocDB tables with the same table name after a failed
legacy rewrite operation. The async transaction cleanup task on master will lookup
the entry for this orphaned table in pg_class (using the pg oid in the DocDB table id),
and will drop the table upon finding that it doesn't exist. If, for some reason,
the orphaned table is not cleaned up, or if a back up is taken before the cleanup happens,
the backup code will simply ignore this orphaned table. Specifically, RepackSnapshotsForBackup
will not find the pg schema name metadata for this table, and will simply ignore it.
Therefore, there will be no ambiguity on restore side (which relies on the table names).
Jira: DB-11703
Original commit: c47b2d9 / D35943

Test Plan:
On an alma8 release build:

./yb_build.sh release --cxx-test tools_yb-backup-cross-feature-test --gtest_filter YBBackupTest.TestBackupWithFailedLegacyRewrite -n 100

Reviewers: myang

Reviewed By: myang

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36219
fizaaluthra added a commit that referenced this issue Jul 8, 2024
…cy rewrite

Summary:
Note: this backport differs from the original commit -- the test
TestBackupWithFailedLegacyRewrite is renamed to TestBackupWithFailedRewrite.
Additionally, commit d5d7363
is only partially backported -- specifically, only changes related to some of the
DDL atomicity tests that rely on the `_temp_old` naming are backported

As part of legacy table rewrite operations that clone the table, the old table is
renamed to <table_name>_temp_old. Without DDL atomicity, if the operation fails at
any point after the rename, the PG metadata is rolled back (relname is rolled back to
<table_name>), but the DocDB table name is still <table_name>_temp_old.
Note: the same rename is also done for the table's indexes.

These renamed DocDB tables cause issues with backup/restore as the restore
operation relies on table names.

We can avoid these issues by avoiding the DocDB level renames altogether,
as the old tables are going to be dropped anyway.

Now, after a failed ADD/DROP pkey, ALTER TYPE operation (using the legacy rewrite),
the DocDB table will no longer be left with the name <table_name>_temp_old.
However, there will now be two DocDB tables with the same table name after a failed
legacy rewrite operation. The async transaction cleanup task on master will lookup
the entry for this orphaned table in pg_class (using the pg oid in the DocDB table id),
and will drop the table upon finding that it doesn't exist. If, for some reason,
the orphaned table is not cleaned up, or if a back up is taken before the cleanup happens,
the backup code will simply ignore this orphaned table. Specifically, RepackSnapshotsForBackup
will not find the pg schema name metadata for this table, and will simply ignore it.
Therefore, there will be no ambiguity on restore side (which relies on the table names).
Jira: DB-11703

Original commits:
c47b2d9 / D35943
d5d7363 / D29786

Test Plan: ./yb_build.sh --cxx-test yb-backup-cross-feature-test --gtest-filter YBBackupTest.TestBackupWithFailedLegacyRewrite

Reviewers: myang

Reviewed By: myang

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36220
fizaaluthra added a commit that referenced this issue Jul 8, 2024
…cy rewrite

Summary:
Note: this backport differs from the original commit -- the test
TestBackupWithFailedLegacyRewrite is renamed to TestBackupWithFailedRewrite.

As part of legacy table rewrite operations that clone the table, the old table is
renamed to <table_name>_temp_old. Without DDL atomicity, if the operation fails at
any point after the rename, the PG metadata is rolled back (relname is rolled back to
<table_name>), but the DocDB table name is still <table_name>_temp_old.
Note: the same rename is also done for the table's indexes.

These renamed DocDB tables cause issues with backup/restore as the restore
operation relies on table names.

We can avoid these issues by avoiding the DocDB level renames altogether,
as the old tables are going to be dropped anyway.

Now, after a failed ADD/DROP pkey, ALTER TYPE operation (using the legacy rewrite),
the DocDB table will no longer be left with the name <table_name>_temp_old.
However, there will now be two DocDB tables with the same table name after a failed
legacy rewrite operation. The async transaction cleanup task on master will lookup
the entry for this orphaned table in pg_class (using the pg oid in the DocDB table id),
and will drop the table upon finding that it doesn't exist. If, for some reason,
the orphaned table is not cleaned up, or if a back up is taken before the cleanup happens,
the backup code will simply ignore this orphaned table. Specifically, RepackSnapshotsForBackup
will not find the pg schema name metadata for this table, and will simply ignore it.
Therefore, there will be no ambiguity on restore side (which relies on the table names).
Jira: DB-11703

Original commit: c47b2d9 / D35943

Test Plan: ./yb_build.sh --cxx-test yb-backup-cross-feature-test --gtest-filter YBBackupTest.TestBackupWithFailedLegacyRewrite

Reviewers: myang

Reviewed By: myang

Subscribers: ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.20 Backport Required 2024.1 Backport Required area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

4 participants