planner: improve row count estimation of IndexJoin's inner scan #12085

eurekaka · 2019-09-09T03:55:24Z

What problem does this PR solve?

Currently, the row count estimation of IndexJoin's inner child is not that accurate because:

we are assuming each outer row would find matches in inner child, in fact this may not hold normally;
when estimating row count of inner child, we are using Count / NDV, if the index used is composite index, and the join key only covers the prefix of the index, the row count would be smaller than the real count

What is changed and how it works?

Initially, I considered comparing histograms of both child plans to compute an overlapping ratio and using it for row count estimation, but found that it was trivial to handle the mismatch between the data types, and it was hard to compute overlap for composite index cases.

Alternatively, I choose to reuse the estimated row count of join result after evaluating join equal conditions, because leftCnt * rightCnt / max(leftNDV, rightNDV) has already taken the overlap into account more or less, and this approach can give more consistent row count estimations for join operator and its children.

Check List

Tests

Unit test

Code changes

N/A

Side effects

Possible performance regression

Related changes

N/A

Release note

Write release note for bug-fix or new feature.

codecov · 2019-09-09T04:00:44Z

Codecov Report

❗ No coverage uploaded for pull request base (master@440bb74). Click here to learn what that means.
The diff coverage is 100%.

@@             Coverage Diff             @@
##             master     #12085   +/-   ##
===========================================
  Coverage          ?   81.3048%           
===========================================
  Files             ?        452           
  Lines             ?      96886           
  Branches          ?          0           
===========================================
  Hits              ?      78773           
  Misses            ?      12460           
  Partials          ?       5653

eurekaka · 2019-09-09T04:26:30Z

/run-all-tests

AilinKid · 2019-09-09T06:11:08Z

/run-all-tests

zyxbest · 2019-09-09T09:44:32Z

/run-unit-test

winoros

lgtm

winoros

lgtm

eurekaka · 2019-09-10T03:44:35Z

/bench

sre-bot · 2019-09-10T08:31:07Z

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 5c18c5df97d935398ea1b44098a0db3171999466
+++ tidb: 7278ab07104a307f04e72e21d02f6528f8591013
tikv: ff82aa9eba331585aec1c6cdf9e1584512bccb34
pd: ce060a9aeb66d6bbb39159243b879740dffae041
================================================================================
test-1: < oltp_insert >
    * QPS : 21141.44 ± 1.5888% (std=240.80) delta: -0.08%
    * AvgMs : 12.10 ± 1.6193% (std=0.14) delta: 0.08%
    * PercentileMs99 : 42.92 ± 1.0903% (std=0.38) delta: 1.46%
            
test-2: < oltp_update_non_index >
    * QPS : 29436.80 ± 0.2322% (std=47.10) delta: -0.10%
    * AvgMs : 8.69 ± 0.2071% (std=0.01) delta: 0.09%
    * PercentileMs99 : 30.59 ± 1.0788% (std=0.27) delta: 1.09%
            
test-3: < oltp_read_write >
    * QPS : 37051.41 ± 0.3770% (std=80.95) delta: 0.32%
    * AvgMs : 138.73 ± 0.3792% (std=0.31) delta: -0.32%
    * PercentileMs99 : 257.95 ± 0.0000% (std=0.00) delta: 0.00%
            
test-4: < oltp_point_select >
    * QPS : 74734.42 ± 3.2705% (std=1572.71) delta: -0.30%
    * AvgMs : 3.43 ± 3.0940% (std=0.07) delta: 0.40%
    * PercentileMs99 : 7.43 ± 0.0000% (std=0.00) delta: 0.00%
            
test-5: < oltp_update_index >
    * QPS : 16880.32 ± 0.5051% (std=62.94) delta: 0.29%
    * AvgMs : 15.16 ± 0.5012% (std=0.06) delta: -0.29%
    * PercentileMs99 : 48.34 ± 0.0000% (std=0.00) delta: 0.00%

https://perf.pingcap.com

alivxxx

LGTM
Please resolve the conflicts.

sre-bot · 2019-09-11T08:58:10Z

Your auto merge job has been accepted, waiting for 12009

sre-bot · 2019-09-11T09:05:18Z

/run-all-tests

planner: improve row count estimation of IndexJoin's inner scan

048da9c

eurekaka added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Sep 9, 2019

eurekaka added the status/all tests passed label Sep 9, 2019

eurekaka added the status/WIP label Sep 9, 2019

eurekaka force-pushed the inlj_inner_scan branch from 70c5082 to a500ba5 Compare September 9, 2019 09:50

winoros reviewed Sep 9, 2019

View reviewed changes

eurekaka force-pushed the inlj_inner_scan branch from a500ba5 to 23e5b8b Compare September 10, 2019 03:12

eurekaka added status/LGT1 Indicates that a PR has LGTM 1. and removed status/WIP labels Sep 10, 2019

eurekaka requested review from lzmhhh123, foreyes, francis0407 and alivxxx September 10, 2019 09:22

alivxxx reviewed Sep 11, 2019

View reviewed changes

eurekaka force-pushed the inlj_inner_scan branch from 7278ab0 to 048da9c Compare September 11, 2019 08:32

eurekaka requested a review from alivxxx September 11, 2019 08:56

alivxxx approved these changes Sep 11, 2019

View reviewed changes

alivxxx added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 11, 2019

Merge branch 'master' into inlj_inner_scan

83ae2d5

sre-bot merged commit f2adf1d into pingcap:master Sep 11, 2019

eurekaka deleted the inlj_inner_scan branch October 8, 2019 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: improve row count estimation of IndexJoin's inner scan #12085

planner: improve row count estimation of IndexJoin's inner scan #12085

eurekaka commented Sep 9, 2019

codecov bot commented Sep 9, 2019 •

edited

Loading

eurekaka commented Sep 9, 2019

AilinKid commented Sep 9, 2019

zyxbest commented Sep 9, 2019

winoros left a comment

winoros left a comment

eurekaka commented Sep 10, 2019

sre-bot commented Sep 10, 2019

alivxxx left a comment

sre-bot commented Sep 11, 2019

sre-bot commented Sep 11, 2019

planner: improve row count estimation of IndexJoin's inner scan #12085

planner: improve row count estimation of IndexJoin's inner scan #12085

Conversation

eurekaka commented Sep 9, 2019

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Sep 9, 2019 • edited Loading

Codecov Report

eurekaka commented Sep 9, 2019

AilinKid commented Sep 9, 2019

zyxbest commented Sep 9, 2019

winoros left a comment

Choose a reason for hiding this comment

winoros left a comment

Choose a reason for hiding this comment

eurekaka commented Sep 10, 2019

sre-bot commented Sep 10, 2019

alivxxx left a comment

Choose a reason for hiding this comment

sre-bot commented Sep 11, 2019

sre-bot commented Sep 11, 2019

codecov bot commented Sep 9, 2019 •

edited

Loading