importccl,backupccl: use `crdb_internal.invalid_objects` to check for invalid descriptors #84757

adityamaru · 2022-07-20T20:35:30Z

SQL Schema have a virtual table to check for invalid descriptors in a cluster. Restore and import are in the business of creating and ingesting descriptors, and so we should check the virtual table post restore/import for any invalid descriptor states they might have generated. This should be added to as many restore/import tests as possible, but also potentially as a sanity check during a restore/import once the descriptors have been published to an online state.

Jira issue: CRDB-17847

blathers-crl · 2022-07-20T20:35:33Z

cc @cockroachdb/bulk-io

…anup This patch adds a test helper function to easily check for invalid descriptors during the cleanup of a test cluster created by any of the backupRestoreTestSetup* helper funcs. This setup is used by many backup unit tests (including all data driven tests), but a future PR should include this clean up helper func in any backup unit test that doesn't include use these helper funcs. Informs cockroachdb#84757 Release note: none

…anup This patch adds a test helper function to easily check for invalid descriptors during the cleanup of a test cluster created by any of the backupRestoreTestSetup* helper funcs. This setup is used by many backup unit tests (including all data driven tests), but a future PR should include this clean up helper func in any backup unit test that doesn't include use these helper funcs. It is also worth noting this check would have caught cockroachdb#76764. Informs cockroachdb#84757 Release note: none

85282: backupccl: check crdb_internal.invalid_objects during backup test cleanup r=adityamaru a=msbutler This patch adds a test helper function to easily check for invalid descriptors during the cleanup of a test cluster created by any of the backupRestoreTestSetup* helper funcs. This setup is used by many backup unit tests (including all data driven tests), but a future PR should include this clean up helper func in any backup unit test that doesn't include use these helper funcs. It's also worth noting this check would have caught #76764. Informs #84757 Release note: none 87820: sql/schemachanger/*: optimize performance, primarily by adding support for containment r=ajwerner a=ajwerner #### screl,scplan/rule: adopt containment #### sql/schemachanger/rel: support inverted indexes and slice containment This commit introduces some new concepts to rel: * Slice attributes * Inverted indexing of slice attributes * Containment queries over slice attributes One note is that the only support for containment queries is via an inverted index. In the future, we could add direct containment evaluation such that if we have a slice we could go iterate its members. The basic idea is to introduce a slice membership type for the slice column and to use the attribute in question to refer to the slice member value. #### sql/schemachanger/rel: fix embedding support #### sql/schemachanger/rel: optimize error checking This change ends up making a huge difference. Otherwise, each subquery invocation would lead to allocations which showed up in the profiles. Fixes #86042 Release note: None 88712: sql: enable `IdempotentTombstone` for schema GC r=ajwerner a=erikgrinaker `@ajwerner` Looks like we forgot to enable this. Let's get it in before the backport freeze. --- This will avoid writing MVCC range tombstones across ranges if they don't contain any live data. This is particularly useful to avoid writing additional tombstones on retries. Release note: None Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

…anup This patch adds a test helper function to easily check for invalid descriptors during the cleanup of a test cluster created by any of the backupRestoreTestSetup* helper funcs. This setup is used by many backup unit tests (including all data driven tests), but a future PR should include this clean up helper func in any backup unit test that doesn't include use these helper funcs. It is also worth noting this check would have caught #76764. Informs #84757 Release note: none

This patch checks that the restored descriptors are valid in two places during the job. First, as soon as the descriptors are written in an offline state before the restoration flow begins. This primarily ensures that the descriptors in the backup were in a valid state, allowing the restore to fail fast. Second, right after the descriptors return online and before the job begins, which ensures the restore created and modified the descriptors correctly. Fixes cockroachdb#84757 Release note: None

This patch checks that the restored descriptors are valid in three places during the job. First, during planning to ensure no existing descriptor in the cluster is invalid. Second, as soon as the descriptors are written in an offline state before the restoration flow begins. This primarily ensures that the descriptors in the backup were in a valid state, allowing the restore to fail fast. Third, right after the descriptors return online and before the job begins, which ensures the restore created and modified the descriptors correctly. Note that this validation step does not check job references in the descriptors for 2 reasons: 1) it would require loading the whole jobs table into memory, 2) during the second check, the offline written descriptors will have invalid job references during a cluster restore, as the jobs table isn't properly populated yet. Fixes cockroachdb#84757 Release note: None

This patch checks that the restored descriptors are valid in three places during the job. First, during planning to ensure no existing descriptor in the cluster is invalid. Second, as soon as the descriptors are written in an offline state before the restoration flow begins. This primarily ensures that the descriptors in the backup were in a valid state, allowing the restore to fail fast. Third, right after the descriptors return online and before the job begins, which ensures the restore created and modified the descriptors correctly. Note that this validation step does not check job references in the descriptors for 2 reasons: 1) it would require loading the whole jobs table into memory, 2) during the second check, the offline written descriptors will have invalid job references during a cluster restore, as the jobs table isn't properly populated yet. If the customer needs to proceed with the restore, they can prevent the restore from failing by passing the skip_validation_check flag. Fixes cockroachdb#84757 Release note: this patch introduces checks during restore which cause the restore to fail if there exist invalid descriptors in the restoring cluster. The user can prevent the restore from failing by passing the skip_validation_check flag.

msbutler · 2022-11-29T01:48:57Z

As discussed in detail here, there's no need for restore to conduct manual descriptor validation, as the sql descriptor collection read/write machinery does it for us. Will close this issue out, once I add a test that verifies this fact.

89358: memo: better selectivity estimates on single-table ORed predicates r=rytaft,michae2 a=msirek Fixes #75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by #74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table. 91040: cloud/amazon: support external ID in AWS assume role r=rhu713 a=rhu713 Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID. When using role chaining, each role in the chain can be associated with a different external ID. For example: `ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`. Additional documentation about external IDs can be found here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html Release note (enterprise change): Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID. When using role chaining, each role in the chain can be associated with a different external ID. For example: `ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`. Addresses #90239 92341: changefeedccl: fix flakey tests r=[miretskiy] a=HonoreDB Our test SQL connection doesn't always read its own writes: it can use different connections for different queries. So ``` INSERT INTO foo; feed(CREATE CHANGEFEED FOR foo) ``` is a race condition. This just doesn't matter for most tests because if the insert happens after the initial scan, it creates the same payload. But tests with initial_scan='only' won't see it at all. Fixes #92211 Release note: None 92462: scripts: improve pprof-loop.sh r=andrewbaptist a=tbg Now it also works with the runtime trace endpoint (worked before but printed concerning-looking errors) and with non-blocking endpoints (cumulative profiles such as heap or mutex), for which it previously busy-looped. Epic: None Release note: None 92591: ui: add 99.9th and 99.99th SQL latency charts r=maryliag a=maryliag This commit introduces the charts for: `Service Latency: SQL Statements, 99.99th percentile` and `Service Latency: SQL Statements, 99.9th percentile` on Metrics page under SQL view. Fixes #74247 <img width="806" alt="Screen Shot 2022-11-28 at 12 23 56 PM" src="https://user-images.githubusercontent.com/1017486/204342077-b5509f51-f94e-44be-a2e8-8bd185d12ce6.png"> Release note (ui change): Added charts `Service Latency: SQL Statements, 99.9th percentile` and `Service Latency: SQL Statements, 99.99th percentile` to Metrics page, under SQL view. 92635: tree: fix internal error comparing tuple type with non-tuple type r=rytaft,mgartner a=msirek Fixes #91532 This fixes comparison of an empty tuple expression, ROW() or (), or a non-empty tuple expression with a non-tuple type by returning an error message instead of an internal error. Release note (bug fix): This fixes an internal error when comparing a tuple type with a non-tuple type. 92636: backupccl: add restore corrupt descriptor validation test r=adityamaru a=msbutler This patch adds a test that ensures that restore automatically detects corrupt restoring descriptors via the sql descriptor collection read/write machinery. Fixes #84757 Release note: None 92692: jobs: protected timestamps refresh logic is not transactional r=fqazi a=fqazi Fixes: #92427 Previously, the refresh logic for protected timestamps would fetch jobs with transactions before attempting to update an existing one. Due to a recent change to allow this API to reflect any changes into individual job objects, we no longer do that correctly. This patch fixes the protected timestamp manager to update timestamps in a transactionally, safe manner since multiple updates can be happening concurrently for schema changes. Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Rui Hu <rui@cockroachlabs.com> Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: maryliag <marylia@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>

adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Jul 20, 2022

blathers-crl bot added the T-disaster-recovery label Jul 20, 2022

exalate-issue-sync bot assigned msbutler Jul 26, 2022

msbutler mentioned this issue Jul 28, 2022

backupccl: check crdb_internal.invalid_objects during backup test cleanup #85282

Merged

msbutler added the stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. label Aug 25, 2022

blathers-crl bot mentioned this issue Sep 27, 2022

release-22.2: backupccl: check crdb_internal.invalid_objects during backup test cleanup #88782

Closed

msbutler mentioned this issue Nov 4, 2022

backupccl: validate descriptors during restore #91250

Closed

msbutler mentioned this issue Nov 29, 2022

backupccl: add restore corrupt descriptor validation test #92636

Merged

craig bot closed this as completed in eb7ffb8 Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importccl,backupccl: use `crdb_internal.invalid_objects` to check for invalid descriptors #84757

importccl,backupccl: use `crdb_internal.invalid_objects` to check for invalid descriptors #84757

adityamaru commented Jul 20, 2022 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Jul 20, 2022

msbutler commented Nov 29, 2022 •

edited

Loading

importccl,backupccl: use crdb_internal.invalid_objects to check for invalid descriptors #84757

importccl,backupccl: use crdb_internal.invalid_objects to check for invalid descriptors #84757

Comments

adityamaru commented Jul 20, 2022 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Jul 20, 2022

msbutler commented Nov 29, 2022 • edited Loading

importccl,backupccl: use `crdb_internal.invalid_objects` to check for invalid descriptors #84757

importccl,backupccl: use `crdb_internal.invalid_objects` to check for invalid descriptors #84757

adityamaru commented Jul 20, 2022 •

edited by cockroach-jira-scripts

Loading

msbutler commented Nov 29, 2022 •

edited

Loading