Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

Merged
merged 3 commits into from
Jun 14, 2024

Conversation

ghazalfamilyusa
Copy link
Contributor

What problem does this PR solve?

Issue Number: Ref #41598

Problem Summary:

Index range extraction in the optimizer can handle disjunctions like
((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and builds the proper index range like "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" for this example. The problem is that when we add another conjunct like d > 3 then the optimizer takes another path which find only the point ranges "[1,1], [4,4]" in this example. Finding point ranges is OK for complex CNF where we only pick ranges from only one conjunct, But, in this case, the best conjunct is also better than point ranges.

What changed and how does it work?

We added code that picks the best conjunct ranges if it is more selective than the point ranges. For example, "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" is more selective than "[1,1], [4,4]" . We implemented that by checking if the best CNF ranges is a subset of the point ranges.

Check List

Tests

  • Unit test
  • [] Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/invalid-title sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 10, 2024
Copy link

tiprow bot commented Jun 10, 2024

Hi @ghazalfamilyusa. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ghazalfamilyusa ghazalfamilyusa changed the title Optimizer:Fix range extraction for CNF(conjunctive normal form) predi… Optimizer: Fix range extraction for CNF(conjunctive normal form) predi… Jun 10, 2024
@ghazalfamilyusa ghazalfamilyusa changed the title Optimizer: Fix range extraction for CNF(conjunctive normal form) predi… Optimizer: Fix range extraction for CNF(conjunctive normal form) Jun 10, 2024
@ghazalfamilyusa ghazalfamilyusa force-pushed the range_enhancements branch 4 times, most recently from b56c9b2 to 8b90abf Compare June 11, 2024 00:25
Copy link

codecov bot commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 12 lines in your changes missing coverage. Please review.

Project coverage is 55.9975%. Comparing base (87d6f0f) to head (9c7f0c4).
Report is 36 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #53908         +/-   ##
=================================================
- Coverage   70.9866%   55.9975%   -14.9892%     
=================================================
  Files          1507       1629        +122     
  Lines        413072     607643     +194571     
=================================================
+ Hits         293226     340265      +47039     
- Misses       100540     244184     +143644     
- Partials      19306      23194       +3888     
Flag Coverage Δ
integration 37.0878% <81.8181%> (?)
unit 71.4505% <81.8181%> (+1.5274%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <ø> (-2.2339%) ⬇️
parser ∅ <ø> (∅)
br 50.0602% <ø> (+6.5166%) ⬆️

@hawkingrei
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Jun 11, 2024
@ghazalfamilyusa ghazalfamilyusa changed the title Optimizer: Fix range extraction for CNF(conjunctive normal form) Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 Jun 11, 2024
@ghazalfamilyusa ghazalfamilyusa force-pushed the range_enhancements branch 3 times, most recently from 20166dc to 8525db7 Compare June 11, 2024 18:02
@ghazalfamilyusa
Copy link
Contributor Author

/test unit-test

Copy link

tiprow bot commented Jun 11, 2024

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ghazalfamilyusa
Copy link
Contributor Author

/test fast_test_tiprow

Copy link

ti-chi-bot bot commented Jun 11, 2024

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test check-dev
  • /test check-dev2
  • /test mysql-test
  • /test pull-br-integration-test
  • /test pull-integration-ddl-test
  • /test pull-lightning-integration-test
  • /test pull-mysql-client-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test canary-notify-when-compatibility-sections-changed
  • /test pingcap/tidb/canary_ghpr_unit_test
  • /test pull-common-test
  • /test pull-e2e-test
  • /test pull-integration-common-test
  • /test pull-integration-copr-test
  • /test pull-integration-jdbc-test
  • /test pull-integration-mysql-test
  • /test pull-integration-nodejs-test
  • /test pull-sqllogic-test
  • /test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

  • pingcap/tidb/ghpr_build
  • pingcap/tidb/ghpr_check
  • pingcap/tidb/ghpr_check2
  • pingcap/tidb/ghpr_mysql_test
  • pingcap/tidb/ghpr_unit_test
  • pingcap/tidb/pull_integration_ddl_test
  • pingcap/tidb/pull_mysql_client_test

In response to this:

/test fast_test_tiprow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ghazalfamilyusa
Copy link
Contributor Author

/retest

@ghazalfamilyusa
Copy link
Contributor Author

/test fast_test_tiprow

Copy link

ti-chi-bot bot commented Jun 11, 2024

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test check-dev
  • /test check-dev2
  • /test mysql-test
  • /test pull-br-integration-test
  • /test pull-integration-ddl-test
  • /test pull-lightning-integration-test
  • /test pull-mysql-client-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test canary-notify-when-compatibility-sections-changed
  • /test pingcap/tidb/canary_ghpr_unit_test
  • /test pull-common-test
  • /test pull-e2e-test
  • /test pull-integration-common-test
  • /test pull-integration-copr-test
  • /test pull-integration-jdbc-test
  • /test pull-integration-mysql-test
  • /test pull-integration-nodejs-test
  • /test pull-sqllogic-test
  • /test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

  • pingcap/tidb/ghpr_build
  • pingcap/tidb/ghpr_check
  • pingcap/tidb/ghpr_check2
  • pingcap/tidb/ghpr_mysql_test
  • pingcap/tidb/ghpr_unit_test
  • pingcap/tidb/pull_integration_ddl_test
  • pingcap/tidb/pull_mysql_client_test

In response to this:

/test fast_test_tiprow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wuhuizuo
Copy link
Contributor

/cc zanmato1984 elsa0520

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 12, 2024
@ghazalfamilyusa
Copy link
Contributor Author

ghazalfamilyusa commented Jun 12, 2024

/cc @XuHuaiyu for expression changes

@ghazalfamilyusa
Copy link
Contributor Author

ghazalfamilyusa commented Jun 12, 2024

/cc @qw4990

@elsa0520
Copy link
Contributor

What problem does this PR solve?

Issue Number: Ref #41598

Problem Summary:

Index range extraction in the optimizer can handle disjunctions like ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and builds the proper index range like "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" for this example. The problem is that when we add another conjunct like d > 3 then the optimizer takes another path which find only the point ranges "[1,1], [4,4]" in this example. Finding point ranges is OK for complex CNF where we only pick ranges from only one conjunct, But, in this case, the best conjunct is also better than point ranges.

What changed and how does it work?

We added code that picks the best conjunct ranges if it is more selective than the point ranges. For example, "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" is more selective than "[1,1], [4,4]" . We implemented that by checking if the best CNF ranges is a subset of the point ranges.

Check List

Tests

  • Unit test
  • [] Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

What's different between with and d>3 and without and d>3 ?
Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]".
Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@ghazalfamilyusa
Copy link
Contributor Author

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?


@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression change LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 13, 2024
Copy link

ti-chi-bot bot commented Jun 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-06-12 05:22:13.548487644 +0000 UTC m=+528487.601799569: ☑️ agreed by zanmato1984.
  • 2024-06-13 05:56:03.215883535 +0000 UTC m=+616917.269195459: ☑️ agreed by XuHuaiyu.

@elsa0520
Copy link
Contributor

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

But in the design we will deal with the CNF intersection. So after you support Computing intersection for CNF . We can merge the ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and d>3 . Then we don't need to support PR alone, isn't it?

@ghazalfamilyusa
Copy link
Contributor Author

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

But in the design we will deal with the CNF intersection. So after you support Computing intersection for CNF . We can merge the ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and d>3 . Then we don't need to support PR alone, isn't it?

@elsa0520 : seems my previous reply is not clear. The intersection is not 100% guaranteed at least for a while. Initially, we will have a flag and later on we may implement a heuristic solution. So, we need the point ranges in case the flag is off or if the intersection is not done for all the conjuncts. Now, with the reality of having point ranges (just in case) we still need to choose bestCNF over range. we can remove point ranges support and related PRs once we have: 100% intersection applied with no flag. Hope this makes sense and if not we can chat

@@ -462,6 +463,18 @@ func (d *rangeDetacher) detachCNFCondAndBuildRangeForIndex(conditions []expressi
// TODO: we will optimize it later.
res.RemainedConds = AppendConditionsIfNotExist(res.RemainedConds, remainedConds)
res.Ranges = ranges
if bestCNFItemRes != nil && res != nil && len(res.Ranges) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please comment this case " (heuristics applied for long lists or we turn off the intersection)" in front of here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@elsa0520 elsa0520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

ti-chi-bot bot commented Jun 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elsa0520, XuHuaiyu, zanmato1984

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot merged commit b96a775 into pingcap:master Jun 14, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants