-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better selectivity estimates for ORed predicates #75090
Labels
A-sql-optimizer
SQL logical planning and optimizations.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
msirek
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Jan 18, 2022
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Oct 5, 2022
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of ORed predicates when each disjunct is a tight constraint because we give up and go with the default selectivity of 1/3 in that case. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
Started working on this here: #89358 |
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Nov 8, 2022
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Nov 29, 2022
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
craig bot
pushed a commit
that referenced
this issue
Nov 30, 2022
89358: memo: better selectivity estimates on single-table ORed predicates r=rytaft,michae2 a=msirek Fixes #75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by #74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table. 91040: cloud/amazon: support external ID in AWS assume role r=rhu713 a=rhu713 Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID. When using role chaining, each role in the chain can be associated with a different external ID. For example: `ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`. Additional documentation about external IDs can be found here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html Release note (enterprise change): Support passing in the optional external ID when assuming a role. This is done by extending the values of the comma-separated string value of the ASSUME_ROLE parameter to the format `<role>;external_id=<id>`. Users can still use the previous format of just `<role>` to specify a role without any external ID. When using role chaining, each role in the chain can be associated with a different external ID. For example: `ASSUME_ROLE=<roleA>;external_id=<idA>,<roleB>;external_id=<idB>,<roleC>` will use external ID `<idA>` to assume delegate `<roleA>`, then use external ID `<idB>` to assume delegate `<roleB>`, and finally no external ID to assume the final role `<roleC>`. Addresses #90239 92341: changefeedccl: fix flakey tests r=[miretskiy] a=HonoreDB Our test SQL connection doesn't always read its own writes: it can use different connections for different queries. So ``` INSERT INTO foo; feed(CREATE CHANGEFEED FOR foo) ``` is a race condition. This just doesn't matter for most tests because if the insert happens after the initial scan, it creates the same payload. But tests with initial_scan='only' won't see it at all. Fixes #92211 Release note: None 92462: scripts: improve pprof-loop.sh r=andrewbaptist a=tbg Now it also works with the runtime trace endpoint (worked before but printed concerning-looking errors) and with non-blocking endpoints (cumulative profiles such as heap or mutex), for which it previously busy-looped. Epic: None Release note: None 92591: ui: add 99.9th and 99.99th SQL latency charts r=maryliag a=maryliag This commit introduces the charts for: `Service Latency: SQL Statements, 99.99th percentile` and `Service Latency: SQL Statements, 99.9th percentile` on Metrics page under SQL view. Fixes #74247 <img width="806" alt="Screen Shot 2022-11-28 at 12 23 56 PM" src="https://user-images.githubusercontent.com/1017486/204342077-b5509f51-f94e-44be-a2e8-8bd185d12ce6.png"> Release note (ui change): Added charts `Service Latency: SQL Statements, 99.9th percentile` and `Service Latency: SQL Statements, 99.99th percentile` to Metrics page, under SQL view. 92635: tree: fix internal error comparing tuple type with non-tuple type r=rytaft,mgartner a=msirek Fixes #91532 This fixes comparison of an empty tuple expression, ROW() or (), or a non-empty tuple expression with a non-tuple type by returning an error message instead of an internal error. Release note (bug fix): This fixes an internal error when comparing a tuple type with a non-tuple type. 92636: backupccl: add restore corrupt descriptor validation test r=adityamaru a=msbutler This patch adds a test that ensures that restore automatically detects corrupt restoring descriptors via the sql descriptor collection read/write machinery. Fixes #84757 Release note: None 92692: jobs: protected timestamps refresh logic is not transactional r=fqazi a=fqazi Fixes: #92427 Previously, the refresh logic for protected timestamps would fetch jobs with transactions before attempting to update an existing one. Due to a recent change to allow this API to reflect any changes into individual job objects, we no longer do that correctly. This patch fixes the protected timestamp manager to update timestamps in a transactionally, safe manner since multiple updates can be happening concurrently for schema changes. Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Rui Hu <rui@cockroachlabs.com> Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: maryliag <marylia@cockroachlabs.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
mgartner
pushed a commit
to mgartner/cockroach
that referenced
this issue
Dec 29, 2022
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
mgartner
pushed a commit
to mgartner/cockroach
that referenced
this issue
Jan 3, 2023
Fixes cockroachdb#75090 Currently, the optimizer only does a good job of estimating the selectivity of single-table ORed predicates when all disjuncts are tight constraints. Otherwise we give up and go with the default selectivity of 1/3 for the entire filter. If any of the disjuncts has a selectivity above 1/3, then the total selectivity should be at least as high as the selectivity of that disjunct. Any failure to find a tight constraint for a given disjunct should only cause a selectivity of 1/3 to be used for that disjunct, not the entire filter. This is addressed by using the method of combining ORed selectivities introduced by cockroachdb#74303: ``` Given predicates A and B and probability function P: P(A or B) = P(A) + P(B) - P(A and B) ``` The above formula is applied n-1 times, given n disjuncts. For each disjunct which is not a tight constraint, 1/3 is used for that predicate's selectivity in the formula. Release note (bug fix): This patch fixes incorrect selectivity estimation for queries with ORed predicates all referencing a common single table.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-optimizer
SQL logical planning and optimizations.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Description
(#74303) Adds support for combining selectivities of ORed equijoin predicates with an industry-standard approach using the disjunction rule:
This issue is opened to also apply the rule to single-table filter predicates on different columns for both Selects and Joins, e.g.,
Selectivities of predicates on the same column should be additive if the ranges are non-overlapping, e.g.,
Describe alternatives you've considered
Multicolumn stats could be used in the future to further improve estimates by getting a more accurate result for P(A and B).
Jira issue: CRDB-12458
The text was updated successfully, but these errors were encountered: