-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema: drop column with index using hash cascade DML injection failure #111619
Comments
A simpler repo
The concurrent DML failed at the very beginning of PostCommitPhase (i.e. before executing first stage of PostCommitPhase). At the point in time, the relevant schema objects are in the following states:
From a schema point of view, I didn't see anything particular wrong with those transitions, so I decided to go a layer above to the optimizer to see where exactly is the error thrown. That concurrent DELETE, under this descriptor state, runs into the error with the following stacktrace
My understanding is that @mgartner Can I ask you to chime in for your thoughts? In particular, I'd like to hear your thoughts on a few questions:
|
This looks similar to the issue in #79613. See also #79691, #83593, and #97368. These states are invalid, at least according to the optimizer's interpretation of
The optimizer must plan to delete entries in the I'll note that, based on the same reasoning, a non-sharded index on a column with the following state should also be invalid:
AFAICT this won't cause an internal error, but it might be the source of correctness problems. However, when this issue has come up before (see the issues I linked above), there was skepticism about whether or not reads can be made internally by the system on |
#59233 is highly relevant and seems to have figured out most of these questions. TLDR; columns that have been fully backfilled and are safe for reading by internal operations, but should not be accessible to the user should be "ordinary" "inaccessible" columns. |
The solution outlined in #59233 is not applicable here. That solution talks of situations where certain columns need to be inaccessible to user queries but can be internally considered readable. The most notable point here though is that such columns don't have a WRITE_ONLY mutation on them. |
This particular issue seems to have started occurring post the changes in #106782. In this PR, we allow computation of virtual columns that are under mutation (i.e. in the process of being added/dropped) within CASE 1: The referenced column is public. As long as the computed column is DELETE_ONLY, WRITE_ONLY or public, we can read from the referenced column. CASE 2: The referenced column is WRITE_ONLY. In this case, as long as the computed column is WRITE_ONLY or DELETE_ONLY, we can safely read the referenced column for the purpose of column computation within addComputedColsToTable. CASE 3: The referenced column is DELETE_ONLY. In this case, as long as the computed column is also in the DELETE_ONLY, we can read the referenced column. As from Case 2, this will only occur on a column being dropped. This indicates that the data contained within the column is committed and complete and can be read for the purpose of deletion operations which are the only operations allowed within the DELETE_ONLY phase. This should also work as expected for rollbacks as we will go through similar steps and only read from the referenced column under mutation when it contains actual data by enforcing the above rules. cc @fqazi to confirm my above assumptions. This involves making a change to a fairly deep seated assumption within the queries code. I would like someone from the @cockroachdb/sql-queries team to opine on the above. If we are in agreement, we can look at making this change in a safe manner. We can start by limiting this change to allow reads for under mutation virtual computed columns only. I have, intentionally, avoided going into more cases of where we may want such internal reads to be permitted. I believe there must be more cases, but for now, we can start off here. |
@rimadeodhar The assumptions above all make sense. I think as long awe keep both in lock step in terms of the states we should be safe. |
@rimadeodhar and I just discussed this a bit more just now. Some notes on new things we discussed.
|
Documenting here that we'll need to update the optimizer to allow inaccessible columns to be read for internal operations for this. |
I dug into this further and it looks like we can resolve this by setting up the schema dependency rules without any optimizer updates. We can leverage the same principle to fix #111608 and #118314. |
This PR fixes the new schema changer deprules for dropping virtual computed columns which are also used for hash and expression indexes. Currently, the optimizer allows for virtual, computed columns to be evaluated even when under mutation. However, this causes concurrent DML issues when the schemachanger job is running as the column that the virtual computed column depends on moves into WRITE_ONLY stage prior to the computed column being dropped. As a result, the optimizer is unable to access the column for evaluating the compute expression. This PR updates the dep rules to ensure the virtual, computed column is dropped before the dependent column moves to WRITE_ONLY ensuring that the compute expression can be enforced correctly for concurrent DML during all stages of the schema change. Epic: none Fixes: cockroachdb#111608 Fixes: cockroachdb#111619 Release note: None
This PR fixes the new schema changer deprules for dropping virtual computed columns which are also used for hash and expression indexes. Currently, the optimizer allows for virtual, computed columns to be evaluated even when under mutation. However, this causes concurrent DML issues when the schemachanger job is running as the column that the virtual computed column depends on moves into WRITE_ONLY stage prior to the computed column being dropped. As a result, the optimizer is unable to access the column for evaluating the compute expression. This PR updates the dep rules to ensure the virtual, computed column is dropped before the dependent column moves to WRITE_ONLY ensuring that the compute expression can be enforced correctly for concurrent DML during all stages of the schema change. Epic: none Fixes: cockroachdb#111608 Fixes: cockroachdb#111619 Release note: None
120794: scplan: Fix deprules for dropping computed columns r=rimadeodhar a=rimadeodhar This PR fixes the new schema changer deprules for dropping virtual computed columns which are also used for hash and expression indexes. Currently, the optimizer allows for virtual, computed columns to be evaluated even when under mutation. However, this causes concurrent DML issues when the schemachanger job is running as the column that the virtual computed column depends on moves into WRITE_ONLY stage prior to the computed column being dropped. As a result, the optimizer is unable to access the column for evaluating the compute expression. This PR updates the dep rules to ensure the virtual, computed column is dropped before the dependent column moves to WRITE_ONLY ensuring that the compute expression can be enforced correctly for concurrent DML during all stages of the schema change. Epic: none Fixes: #111608 Fixes: #111619 Release note: None ------------------------------------------------------------------ **Note for reviewers:** This PR is stacked on top of #120792. Co-authored-by: rimadeodhar <rima@cockroachlabs.com>
This PR fixes the new schema changer deprules for dropping virtual computed columns which are also used for hash and expression indexes. Currently, the optimizer allows for virtual, computed columns to be evaluated even when under mutation. However, this causes concurrent DML issues when the schemachanger job is running as the column that the virtual computed column depends on moves into WRITE_ONLY stage prior to the computed column being dropped. As a result, the optimizer is unable to access the column for evaluating the compute expression. This PR updates the dep rules to ensure the virtual, computed column is dropped before the dependent column moves to WRITE_ONLY ensuring that the compute expression can be enforced correctly for concurrent DML during all stages of the schema change. Epic: none Fixes: #111608 Fixes: #111619 Release note: None
Re-opening to track backport to 24.1 |
This PR fixes the new schema changer deprules for dropping virtual computed columns which are also used for hash and expression indexes. Currently, the optimizer allows for virtual, computed columns to be evaluated even when under mutation. However, this causes concurrent DML issues when the schemachanger job is running as the column that the virtual computed column depends on moves into WRITE_ONLY stage prior to the computed column being dropped. As a result, the optimizer is unable to access the column for evaluating the compute expression. This PR updates the dep rules to ensure the virtual, computed column is dropped before the dependent column moves to WRITE_ONLY ensuring that the compute expression can be enforced correctly for concurrent DML during all stages of the schema change. Epic: none Fixes: cockroachdb#111608 Fixes: cockroachdb#111619 Release note: None
Closing now that the backport is merged. |
Setup
Schema change
Schema change DML injection failure
Same underlying bug as #111608
Jira issue: CRDB-31989
Epic CRDB-37763
The text was updated successfully, but these errors were encountered: