Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl,backupccl: use crdb_internal.invalid_objects to check for invalid descriptors #84757

Closed
adityamaru opened this issue Jul 20, 2022 · 2 comments · Fixed by #92636
Closed
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Jul 20, 2022

SQL Schema have a virtual table to check for invalid descriptors in a cluster. Restore and import are in the business of creating and ingesting descriptors, and so we should check the virtual table post restore/import for any invalid descriptor states they might have generated. This should be added to as many restore/import tests as possible, but also potentially as a sanity check during a restore/import once the descriptors have been published to an online state.

Jira issue: CRDB-17847

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Jul 20, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 20, 2022

cc @cockroachdb/bulk-io

msbutler added a commit to msbutler/cockroach that referenced this issue Jul 28, 2022
…anup

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

Informs cockroachdb#84757

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 29, 2022
…anup

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

Informs cockroachdb#84757

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 29, 2022
…anup

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

Informs cockroachdb#84757

Release note: none
@msbutler msbutler added the stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. label Aug 25, 2022
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 26, 2022
…anup

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

It is also worth noting this check would have caught cockroachdb#76764.

Informs cockroachdb#84757

Release note: none
craig bot pushed a commit that referenced this issue Sep 26, 2022
85282: backupccl: check crdb_internal.invalid_objects during backup test cleanup r=adityamaru a=msbutler

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

It's also worth noting this check would have caught #76764.

Informs #84757

Release note: none

87820: sql/schemachanger/*: optimize performance, primarily by adding support for containment r=ajwerner a=ajwerner

#### screl,scplan/rule: adopt containment

#### sql/schemachanger/rel: support inverted indexes and slice containment

This commit introduces some new concepts to rel:
* Slice attributes
* Inverted indexing of slice attributes
* Containment queries over slice attributes

One note is that the only support for containment queries is via an inverted
index. In the future, we could add direct containment evaluation such that if
we have a slice we could go iterate its members.

The basic idea is to introduce a slice membership type for the slice column
and to use the attribute in question to refer to the slice member value.

#### sql/schemachanger/rel: fix embedding support

#### sql/schemachanger/rel: optimize error checking
This change ends up making a huge difference. Otherwise, each subquery
invocation would lead to allocations which showed up in the profiles.

Fixes #86042

Release note: None

88712: sql: enable `IdempotentTombstone` for schema GC r=ajwerner a=erikgrinaker

`@ajwerner` Looks like we forgot to enable this. Let's get it in before the backport freeze.

---

This will avoid writing MVCC range tombstones across ranges if they don't contain any live data. This is particularly useful to avoid writing additional tombstones on retries.

Release note: None

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Sep 27, 2022
…anup

This patch adds a test helper function to easily check for invalid descriptors
during the cleanup of a test cluster created by any of the
backupRestoreTestSetup* helper funcs. This setup is used by many backup unit
tests (including all data driven tests), but a future PR should include this
clean up helper func in any backup unit test that doesn't include use
these helper funcs.

It is also worth noting this check would have caught #76764.

Informs #84757

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Nov 4, 2022
This patch checks that the restored descriptors are valid in two
places during the job. First, as soon as the descriptors are written in an
offline state before the restoration flow begins. This primarily ensures that
the descriptors in the backup were in a valid state, allowing the restore to
fail fast. Second, right after the descriptors return online and before the job
begins, which ensures the restore created and modified the descriptors
correctly.

Fixes cockroachdb#84757
Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue Nov 14, 2022
This patch checks that the restored descriptors are valid in three
places during the job. First, during planning to ensure no existing descriptor
in the cluster is invalid. Second, as soon as the descriptors are written in an
offline state before the restoration flow begins. This primarily ensures that
the descriptors in the backup were in a valid state, allowing the restore to
fail fast. Third, right after the descriptors return online and before the job
begins, which ensures the restore created and modified the descriptors
correctly.

Note that this validation step does not check job references in the
descriptors for 2 reasons: 1) it would require loading the whole jobs table
into memory, 2) during the second check, the offline written descriptors will
have invalid job references during a cluster restore, as the jobs table isn't
properly populated yet.

Fixes cockroachdb#84757
Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue Nov 14, 2022
This patch checks that the restored descriptors are valid in three
places during the job. First, during planning to ensure no existing descriptor
in the cluster is invalid. Second, as soon as the descriptors are written in an
offline state before the restoration flow begins. This primarily ensures that
the descriptors in the backup were in a valid state, allowing the restore to
fail fast. Third, right after the descriptors return online and before the job
begins, which ensures the restore created and modified the descriptors
correctly.

Note that this validation step does not check job references in the
descriptors for 2 reasons: 1) it would require loading the whole jobs table
into memory, 2) during the second check, the offline written descriptors will
have invalid job references during a cluster restore, as the jobs table isn't
properly populated yet.

Fixes cockroachdb#84757
Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue Nov 15, 2022
This patch checks that the restored descriptors are valid in three
places during the job. First, during planning to ensure no existing descriptor
in the cluster is invalid. Second, as soon as the descriptors are written in an
offline state before the restoration flow begins. This primarily ensures that
the descriptors in the backup were in a valid state, allowing the restore to
fail fast. Third, right after the descriptors return online and before the job
begins, which ensures the restore created and modified the descriptors
correctly.

Note that this validation step does not check job references in the
descriptors for 2 reasons: 1) it would require loading the whole jobs table
into memory, 2) during the second check, the offline written descriptors will
have invalid job references during a cluster restore, as the jobs table isn't
properly populated yet.

If the customer needs to proceed with the restore, they can prevent the restore
from failing by passing the skip_validation_check flag.

Fixes cockroachdb#84757

Release note: this patch introduces checks during restore  which
cause the restore to fail if there exist invalid descriptors in the restoring
cluster. The user can prevent the restore from failing by passing the
skip_validation_check flag.
@msbutler
Copy link
Collaborator

msbutler commented Nov 29, 2022

As discussed in detail here, there's no need for restore to conduct manual descriptor validation, as the sql descriptor collection read/write machinery does it for us. Will close this issue out, once I add a test that verifies this fact.

craig bot pushed a commit that referenced this issue Nov 30, 2022
89358: memo: better selectivity estimates on single-table ORed predicates r=rytaft,michae2 a=msirek

Fixes #75090

Currently, the optimizer only does a good job of estimating the       
selectivity of single-table ORed predicates when all disjuncts are tight  
constraints. Otherwise we give up and go with the default selectivity of    
1/3 for the entire filter. If any of the disjuncts has a selectivity 
above 1/3, then the total selectivity should be at least as high as the 
selectivity of that disjunct. Any failure to find a tight constraint for
a given disjunct should only cause a selectivity of 1/3 to be used for
that disjunct, not the entire filter.  

This is addressed by using the method of combining ORed selectivities
introduced by #74303:
```
   Given predicates A and B and probability function P:

      P(A or B) = P(A) + P(B) - P(A and B)
```
The above formula is applied n-1 times, given n disjuncts.
For each disjunct which is not a tight constraint, 1/3 is used for that
predicate's selectivity in the formula.

Release note (bug fix): This patch fixes incorrect selectivity
estimation for queries with ORed predicates all referencing a
common single table.


91040: cloud/amazon: support external ID in AWS assume role r=rhu713 a=rhu713

Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID.

When using role chaining, each role in the chain can be associated with a different external ID. For example:
`ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`.

Additional documentation about external IDs can be found here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html

Release note (enterprise change): Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID.

When using role chaining, each role in the chain can be associated with a different external ID. For example:
`ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`.

Addresses #90239

92341: changefeedccl: fix flakey tests r=[miretskiy] a=HonoreDB

Our test SQL connection doesn't always read its own writes: it can use different connections for different queries. So
```
INSERT INTO foo;
feed(CREATE CHANGEFEED FOR foo)
```
is a race condition. This just doesn't matter for most tests because if the insert happens after the initial scan, it creates the same payload. But tests with initial_scan='only' won't see it at all.

Fixes #92211

Release note: None

92462: scripts: improve pprof-loop.sh r=andrewbaptist a=tbg

Now it also works with the runtime trace endpoint (worked before but
printed concerning-looking errors) and with non-blocking endpoints
(cumulative profiles such as heap or mutex), for which it previously
busy-looped.

Epic: None
Release note: None


92591: ui: add 99.9th and 99.99th SQL latency charts r=maryliag a=maryliag

This commit introduces the charts for:
`Service Latency: SQL Statements, 99.99th percentile` and
`Service Latency: SQL Statements, 99.9th percentile` on Metrics page under SQL view.

Fixes #74247

<img width="806" alt="Screen Shot 2022-11-28 at 12 23 56 PM" src="https://user-images.githubusercontent.com/1017486/204342077-b5509f51-f94e-44be-a2e8-8bd185d12ce6.png">



Release note (ui change): Added charts
`Service Latency: SQL Statements, 99.9th percentile` and `Service Latency: SQL Statements, 99.99th percentile` to Metrics page, under SQL view.

92635: tree: fix internal error comparing tuple type with non-tuple type r=rytaft,mgartner a=msirek

Fixes #91532

This fixes comparison of an empty tuple expression, ROW() or (), or
a non-empty tuple expression with a non-tuple type by returning
an error message instead of an internal error.

Release note (bug fix): This fixes an internal error when comparing
a tuple type with a non-tuple type.

92636: backupccl: add restore corrupt descriptor validation test r=adityamaru a=msbutler

This patch adds a test that ensures that restore automatically detects corrupt restoring descriptors via the sql descriptor collection read/write machinery.

Fixes #84757

Release note: None

92692: jobs: protected timestamps refresh logic is not transactional r=fqazi a=fqazi

Fixes: #92427

Previously, the refresh logic for protected timestamps would fetch jobs with transactions before attempting to update an existing one. Due to a recent change to allow this API to reflect any changes into individual job objects, we no longer do that correctly. This patch fixes the protected timestamp manager to update timestamps in a transactionally, safe manner since multiple updates can be happening concurrently for schema changes.

Release note: None

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Rui Hu <rui@cockroachlabs.com>
Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
@craig craig bot closed this as completed in eb7ffb8 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. T-disaster-recovery
Projects
No open projects
Archived in project
2 participants