Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemachanger: Fix concurrent DML failure when dropping a column with depended on by indexes/constraints #119254

Conversation

Xiang-Gu
Copy link
Contributor

@Xiang-Gu Xiang-Gu commented Feb 15, 2024

This PR is a partial work to address a few concurrent DML failures with ongoing schema changes. The main theme of those failures involves dropping a column with dependent hash-sharded index, or expression index, or constraint, as reported in #111608, #111619, and #118314.

Our proposed solution is to introduce a new status for column element in the DSC so that the transition becomes

ABSENT <--> DELETE_ONLY <--> WRITE_ONLY <--> PUBLIC_BUT_INACCESSIBLE <--> PUBLIC

where the column stays in that newly introduced PUBLIC_BUT_INACCESSIBLE state until we hit NonRevertible phase where no thing could fail, at which time we can transition the column to non-public and eventually absent. It gives us two desirable effects with this approach:
1. the column is still public for the optimizer so that the current DML failure, which mostly revolves around a column is no longer public, can be solved;
2. the column is inaccessible for the user so it cannot be referenced in user queries, creating an effect of a dropped (or a dropping) column.

…laceColumnVars

TODO (xiang): This commit might need to be modified so that we can
conditionally disable/remove those checks for schema changes, but still
enforce those checks for other purposes (e.g. we should enforce these
checks when user tried to
`CREATE TABLE err (a INT, INDEX ((a + 10)), CHECK (crdb_internal_idx_expr > 0))`).

Previously, if we have a table with a column `col` and a constraint
referencing `col` and during the schema change of dropping `col`, concurrent
DML (e.g. INSERT) will use the default value (or NULL if no default) for
`col` when it still needs to write to the old primary index. If the
default value failed the constraint (when it's still enforced during
the schema change), it gives out a somewhat confusing error message:

```
ERROR: failed to satisfy CHECK constraint (crdb_internal_column_2_name_placeholder < 0:::INT8): column "crdb_internal_column_2_name_placeholder" does not exist, referenced in "crdb_internal_column_2_name_placeholder < 0:::INT8"
```

where the outmost error message is right ("failed to satisfy CHECK constraint")
but the inner error message is somewhat confusing -- it complains the
dropping column cannot be found. The root cause of this is because we
have logic such that if a check constraint is violated, we are going to
format that check constraint expression via `FormatExprForDisplay` but
it throws an error if any column in the expression is non-public or
inaccessible. This commit argues that, for schema changes, it is possible
to encounter column-in-mutation (i.e. non-public column) whenever we
need to format an expression so this commit removes the requirement, so
that the error message for the same concurrent DML will only have the
outmost error message

```
ERROR: failed to satisfy CHECK constraint (crdb_internal_column_2_name_placeholder < 0:::INT8)
```

It also adds a test to ensure that whenever the default value satisfies
the check constraint, concurrent DML stmts completes successfully.

Release note: None
This commit introduces a new status for Column element in its
transitioning between ABSENT and PUBLIC. The updated transitioning is

ABSENT <--> DELETE_ONLY <--> WRITE_ONLY <--> PUBLIC_BUT_INACCESSIBLE <--> PUBLIC

The motivation for this new status is issue revolving concurrent DMLs
when dropping a column. Namely, if a column has a hash index on it, and
we are dropping the column, during the schema change, a concurrent DML
would fail because the column is no longer public and optimizer requires
a column to be public to resolve its type when it's referenced in the
computed expression in the shard column. The idea of the fix is, instead
of transitioning the column to WRITE_ONLY (which means removing the column
from the public column slice and enqueue a column mutation), we keep the
column as a public column but mark it as inaccessible. This way, the
optimizer code would be happy since it's now dealing with a public column
and the user cannot "see" the column either (bc it's inaccessible).

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Xiang-Gu Xiang-Gu changed the title schemaexpr: Remove checks for dropped or inaccessible column when rep… schemachanger: Fix concurrent DML failure when dropping a column with depended on by indexes/constraint Feb 15, 2024
@Xiang-Gu Xiang-Gu changed the title schemachanger: Fix concurrent DML failure when dropping a column with depended on by indexes/constraint schemachanger: Fix concurrent DML failure when dropping a column with depended on by indexes/constraints Feb 15, 2024
@rafiss
Copy link
Collaborator

rafiss commented Jun 5, 2024

replaced by #120792

@rafiss rafiss closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants