Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

ghazalfamilyusa · 2024-06-10T22:44:21Z

What problem does this PR solve?

Issue Number: Ref #41598

Problem Summary:

Index range extraction in the optimizer can handle disjunctions like
((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and builds the proper index range like "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" for this example. The problem is that when we add another conjunct like d > 3 then the optimizer takes another path which find only the point ranges "[1,1], [4,4]" in this example. Finding point ranges is OK for complex CNF where we only pick ranges from only one conjunct, But, in this case, the best conjunct is also better than point ranges.

What changed and how does it work?

We added code that picks the best conjunct ranges if it is more selective than the point ranges. For example, "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" is more selective than "[1,1], [4,4]" . We implemented that by checking if the best CNF ranges is a subset of the point ranges.

Check List

Tests

Unit test
[] Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

…cates

tiprow · 2024-06-10T22:44:38Z

Hi @ghazalfamilyusa. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2024-06-11T00:48:40Z

Codecov Report

Attention: Patch coverage is 81.81818% with 12 lines in your changes missing coverage. Please review.

Project coverage is 55.9975%. Comparing base (87d6f0f) to head (9c7f0c4).
Report is 36 commits behind head on master.

Additional details and impacted files

@@                Coverage Diff                @@
##             master     #53908         +/-   ##
=================================================
- Coverage   70.9866%   55.9975%   -14.9892%     
=================================================
  Files          1507       1629        +122     
  Lines        413072     607643     +194571     
=================================================
+ Hits         293226     340265      +47039     
- Misses       100540     244184     +143644     
- Partials      19306      23194       +3888

Flag	Coverage Δ
integration	`37.0878% <81.8181%> (?)`
unit	`71.4505% <81.8181%> (+1.5274%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.9656% <ø> (-2.2339%)`	⬇️
parser	`∅ <ø> (∅)`
br	`50.0602% <ø> (+6.5166%)`	⬆️

hawkingrei · 2024-06-11T01:04:33Z

/ok-to-test

ghazalfamilyusa · 2024-06-11T18:02:30Z

/test unit-test

tiprow · 2024-06-11T18:02:51Z

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test fast_test_tiprow
/test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ghazalfamilyusa · 2024-06-11T18:36:48Z

/test fast_test_tiprow

ti-chi-bot · 2024-06-11T18:36:52Z

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build
/test check-dev
/test check-dev2
/test mysql-test
/test pull-br-integration-test
/test pull-integration-ddl-test
/test pull-lightning-integration-test
/test pull-mysql-client-test
/test unit-test

The following commands are available to trigger optional jobs:

/test canary-notify-when-compatibility-sections-changed
/test pingcap/tidb/canary_ghpr_unit_test
/test pull-common-test
/test pull-e2e-test
/test pull-integration-common-test
/test pull-integration-copr-test
/test pull-integration-jdbc-test
/test pull-integration-mysql-test
/test pull-integration-nodejs-test
/test pull-sqllogic-test
/test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

pingcap/tidb/ghpr_build
pingcap/tidb/ghpr_check
pingcap/tidb/ghpr_check2
pingcap/tidb/ghpr_mysql_test
pingcap/tidb/ghpr_unit_test
pingcap/tidb/pull_integration_ddl_test
pingcap/tidb/pull_mysql_client_test

In response to this:

/test fast_test_tiprow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ghazalfamilyusa · 2024-06-11T18:39:00Z

/retest

ghazalfamilyusa · 2024-06-11T20:36:22Z

/test fast_test_tiprow

ti-chi-bot · 2024-06-11T20:36:26Z

@ghazalfamilyusa: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build
/test check-dev
/test check-dev2
/test mysql-test
/test pull-br-integration-test
/test pull-integration-ddl-test
/test pull-lightning-integration-test
/test pull-mysql-client-test
/test unit-test

The following commands are available to trigger optional jobs:

/test canary-notify-when-compatibility-sections-changed
/test pingcap/tidb/canary_ghpr_unit_test
/test pull-common-test
/test pull-e2e-test
/test pull-integration-common-test
/test pull-integration-copr-test
/test pull-integration-jdbc-test
/test pull-integration-mysql-test
/test pull-integration-nodejs-test
/test pull-sqllogic-test
/test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

pingcap/tidb/ghpr_build
pingcap/tidb/ghpr_check
pingcap/tidb/ghpr_check2
pingcap/tidb/ghpr_mysql_test
pingcap/tidb/ghpr_unit_test
pingcap/tidb/pull_integration_ddl_test
pingcap/tidb/pull_mysql_client_test

In response to this:

/test fast_test_tiprow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wuhuizuo · 2024-06-12T05:14:13Z

/cc zanmato1984 elsa0520

ghazalfamilyusa · 2024-06-12T05:24:35Z

/cc @XuHuaiyu for expression changes

ghazalfamilyusa · 2024-06-12T05:26:44Z

/cc @qw4990

elsa0520 · 2024-06-12T08:40:39Z

What problem does this PR solve?

Issue Number: Ref #41598

Problem Summary:

Index range extraction in the optimizer can handle disjunctions like ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and builds the proper index range like "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" for this example. The problem is that when we add another conjunct like d > 3 then the optimizer takes another path which find only the point ranges "[1,1], [4,4]" in this example. Finding point ranges is OK for complex CNF where we only pick ranges from only one conjunct, But, in this case, the best conjunct is also better than point ranges.

What changed and how does it work?

We added code that picks the best conjunct ranges if it is more selective than the point ranges. For example, "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]" is more selective than "[1,1], [4,4]" . We implemented that by checking if the best CNF ranges is a subset of the point ranges.

Check List

Tests

Unit test

[] Integration test

Manual test (add detailed scripts or steps below)

No need to test

I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU

Performance regression: Consumes more Memory

Breaking backward compatibility

Documentation

Affects user behaviors

Contains syntax changes

Contains variable changes

Contains experimental features

Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.
None

What's different between with and d>3 and without and d>3 ?
Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]".
Why the range of DNF changed to point range when there is additional predicate and d>3 ?

ghazalfamilyusa · 2024-06-12T16:33:52Z

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

XuHuaiyu

The expression change LGTM

ti-chi-bot · 2024-06-13T05:56:04Z

[LGTM Timeline notifier]

Timeline:

2024-06-12 05:22:13.548487644 +0000 UTC m=+528487.601799569: ☑️ agreed by zanmato1984.
2024-06-13 05:56:03.215883535 +0000 UTC m=+616917.269195459: ☑️ agreed by XuHuaiyu.

elsa0520 · 2024-06-13T07:24:43Z

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

But in the design we will deal with the CNF intersection. So after you support Computing intersection for CNF . We can merge the ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and d>3 . Then we don't need to support PR alone, isn't it?

ghazalfamilyusa · 2024-06-13T19:32:08Z

What's different between with and d>3 and without and d>3 ? Whatever with or without and d>3 , the range of DNF ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) should be "(1 2 3,1 2 +inf], (4 5 6,4 5 +inf]". Why the range of DNF changed to point range when there is additional predicate and d>3 ?

@elsa0520 ; this is what I talked about in the design document and in the summary above. The code will take the point ranges path when you have multiple conjuncts regardless if these conjuncts are relevant or not like (d > 3).

But in the design we will deal with the CNF intersection. So after you support Computing intersection for CNF . We can merge the ((a = 1 and b = 2 and c > 3) or (a = 4 and b = 5 and c > 6)) and d>3 . Then we don't need to support PR alone, isn't it?

@elsa0520 : seems my previous reply is not clear. The intersection is not 100% guaranteed at least for a while. Initially, we will have a flag and later on we may implement a heuristic solution. So, we need the point ranges in case the flag is off or if the intersection is not done for all the conjuncts. Now, with the reality of having point ranges (just in case) we still need to choose bestCNF over range. we can remove point ranges support and related PRs once we have: 100% intersection applied with no flag. Hope this makes sense and if not we can chat

elsa0520 · 2024-06-14T03:29:12Z

pkg/util/ranger/detacher.go

@@ -462,6 +463,18 @@ func (d *rangeDetacher) detachCNFCondAndBuildRangeForIndex(conditions []expressi
 		// TODO: we will optimize it later.
 		res.RemainedConds = AppendConditionsIfNotExist(res.RemainedConds, remainedConds)
 		res.Ranges = ranges
+		if bestCNFItemRes != nil && res != nil && len(res.Ranges) != 0 {


Could you please comment this case " (heuristics applied for long lists or we turn off the intersection)" in front of here?

elsa0520

LGTM

ti-chi-bot · 2024-06-14T03:45:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elsa0520, XuHuaiyu, zanmato1984

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [XuHuaiyu,elsa0520,zanmato1984]
~~pkg/bindinfo/OWNERS~~ [elsa0520]
~~pkg/expression/OWNERS~~ [XuHuaiyu,zanmato1984]
~~pkg/planner/OWNERS~~ [elsa0520]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Optimizer:Fix range extraction for CNF(conjunctive normal form) predi…

3a98da6

…cates

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/invalid-title sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 10, 2024

ghazalfamilyusa changed the title ~~Optimizer:Fix range extraction for CNF(conjunctive normal form) predi…~~ Optimizer: Fix range extraction for CNF(conjunctive normal form) predi… Jun 10, 2024

ti-chi-bot bot removed the do-not-merge/invalid-title label Jun 10, 2024

ghazalfamilyusa changed the title ~~Optimizer: Fix range extraction for CNF(conjunctive normal form) predi…~~ Optimizer: Fix range extraction for CNF(conjunctive normal form) Jun 10, 2024

ghazalfamilyusa force-pushed the range_enhancements branch 4 times, most recently from b56c9b2 to 8b90abf Compare June 11, 2024 00:25

ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Jun 11, 2024

ghazalfamilyusa force-pushed the range_enhancements branch from 8b90abf to 570e2f1 Compare June 11, 2024 01:24

ghazalfamilyusa changed the title ~~Optimizer: Fix range extraction for CNF(conjunctive normal form)~~ Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 Jun 11, 2024

ghazalfamilyusa force-pushed the range_enhancements branch 3 times, most recently from 20166dc to 8525db7 Compare June 11, 2024 18:02

end

aa03831

ghazalfamilyusa force-pushed the range_enhancements branch from 8525db7 to aa03831 Compare June 11, 2024 18:51

ti-chi-bot bot requested review from elsa0520 and zanmato1984 June 12, 2024 05:14

zanmato1984 approved these changes Jun 12, 2024

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 12, 2024

XuHuaiyu approved these changes Jun 13, 2024

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 13, 2024

elsa0520 reviewed Jun 14, 2024

View reviewed changes

elsa0520 approved these changes Jun 14, 2024

View reviewed changes

ti-chi-bot bot added the approved label Jun 14, 2024

ghazalfamilyusa force-pushed the range_enhancements branch from 48f41e0 to bfad4b7 Compare June 14, 2024 03:52

Update detacher.go

9c7f0c4

ghazalfamilyusa force-pushed the range_enhancements branch from bfad4b7 to 9c7f0c4 Compare June 14, 2024 03:56

ti-chi-bot bot merged commit b96a775 into pingcap:master Jun 14, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

ghazalfamilyusa commented Jun 10, 2024

tiprow bot commented Jun 10, 2024

codecov bot commented Jun 11, 2024 •

edited

Loading

hawkingrei commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

tiprow bot commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ti-chi-bot bot commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ti-chi-bot bot commented Jun 11, 2024

wuhuizuo commented Jun 12, 2024

ghazalfamilyusa commented Jun 12, 2024 •

edited

Loading

ghazalfamilyusa commented Jun 12, 2024 •

edited

Loading

elsa0520 commented Jun 12, 2024

What problem does this PR solve?

Problem Summary:

What changed and how does it work?

Check List

Release note

ghazalfamilyusa commented Jun 12, 2024

XuHuaiyu left a comment

ti-chi-bot bot commented Jun 13, 2024

elsa0520 commented Jun 13, 2024

ghazalfamilyusa commented Jun 13, 2024

elsa0520 Jun 14, 2024

ghazalfamilyusa Jun 14, 2024

elsa0520 left a comment

ti-chi-bot bot commented Jun 14, 2024

Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

Optimizer: Fix range extraction for CNF(conjunctive normal form) | tidb-test=pr/2341 #53908

Conversation

ghazalfamilyusa commented Jun 10, 2024

What problem does this PR solve?

Problem Summary:

What changed and how does it work?

Check List

Release note

tiprow bot commented Jun 10, 2024

codecov bot commented Jun 11, 2024 • edited Loading

Codecov Report

hawkingrei commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

tiprow bot commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ti-chi-bot bot commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ghazalfamilyusa commented Jun 11, 2024

ti-chi-bot bot commented Jun 11, 2024

wuhuizuo commented Jun 12, 2024

ghazalfamilyusa commented Jun 12, 2024 • edited Loading

ghazalfamilyusa commented Jun 12, 2024 • edited Loading

elsa0520 commented Jun 12, 2024

What problem does this PR solve?

Problem Summary:

What changed and how does it work?

Check List

Release note

ghazalfamilyusa commented Jun 12, 2024

XuHuaiyu left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Jun 13, 2024

[LGTM Timeline notifier]

elsa0520 commented Jun 13, 2024

ghazalfamilyusa commented Jun 13, 2024

elsa0520 Jun 14, 2024

Choose a reason for hiding this comment

ghazalfamilyusa Jun 14, 2024

Choose a reason for hiding this comment

elsa0520 left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Jun 14, 2024

codecov bot commented Jun 11, 2024 •

edited

Loading

ghazalfamilyusa commented Jun 12, 2024 •

edited

Loading

ghazalfamilyusa commented Jun 12, 2024 •

edited

Loading