title | summary | aliases | |||||
---|---|---|---|---|---|---|---|
Understand the Query Execution Plan |
Learn about the execution plan information returned by the `EXPLAIN` statement in TiDB. |
|
Based on the latest statistics of your tables, the TiDB optimizer chooses the most efficient query execution plan, which consists of a series of operators. This document details the execution plan in TiDB.
You can use the EXPLAIN
command in TiDB to view the execution plan. The result of the EXPLAIN
statement provides information about how TiDB executes SQL queries:
EXPLAIN
works together with statements such asSELECT
andDELETE
.- When you execute the
EXPLAIN
statement, TiDB returns the final optimized physical execution plan. In other words,EXPLAIN
displays the complete information about how TiDB executes the SQL statement, such as in which order, how tables are joined, and what the expression tree looks like. - For more information about each column of
EXPLAIN
, seeEXPLAIN
Output Format.
The results of EXPLAIN
shed light on how to index the data tables so that the execution plan can use the index to speed up the execution of SQL statements. You can also use EXPLAIN
to check if the optimizer chooses the optimal order to join tables.
The execution plan in TiDB has a tree structure, with each node of the tree as an operator. Considering the concurrent execution of multiple threads in each operator, all operators consume CPU and memory resources to process data during the execution of a SQL statement. From this point of view, there is no execution order for the operator.
However, from the perspective of which operators process a row of data first, the execution of a piece of data is in order. The following rule roughly summarizes this order:
Build
is always executed before Probe
and always appears before Probe
.
The first half of this rule means: if an operator has multiple child nodes, the operator with the Build
keyword at the end of the child node ID is always executed before the operator with the Probe
keyword. The second half means: when TiDB shows the execution plan, the Build
side always appears first, followed by the Probe
side.
The following examples illustrate this rule:
{{< copyable "sql" >}}
explain select * from t use index(idx_a) where a = 1;
+-------------------------------+---------+-----------+-------------------------+---------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+-------------------------+---------------------------------------------+
| IndexLookUp_7 | 10.00 | root | | |
| ├─IndexRangeScan_5(Build) | 10.00 | cop[tikv] | table:t, index:idx_a(a) | range:[1,1], keep order:false, stats:pseudo |
| └─TableRowIDScan_6(Probe) | 10.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------------+---------+-----------+-------------------------+---------------------------------------------+
3 rows in set (0.00 sec)
The IndexLookUp_7
operator has two child nodes: IndexRangeScan_5 (Build)
and TableRowIDScan_6 (Probe)
. IndexRangeScan_5 (Build)
appears first.
To get a piece of data, first, you need to execute IndexRangeScan_5 (Build)
to get a RowID. Then, use TableRowIDScan_6 (Probe)
to get a complete row of data based on the RowID.
The implication of the above rule is: for nodes at the same level, the operator that appears first might be executed first, and the operator that appears last might be executed last. This can be illustrated in the following example:
{{< copyable "sql" >}}
explain select * from t t1 use index(idx_a) join t t2 use index() where t1.a = t2.a;
+----------------------------------+----------+-----------+--------------------------+------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+----------+-----------+--------------------------+------------------------------------------------------------------+
| HashJoin_22 | 12487.50 | root | | inner join, inner:TableReader_26, equal:[eq(test.t.a, test.t.a)] |
| ├─TableReader_26(Build) | 9990.00 | root | | data:Selection_25 |
| │ └─Selection_25 | 9990.00 | cop[tikv] | | not(isnull(test.t.a)) |
| │ └─TableFullScan_24 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─IndexLookUp_29(Probe) | 9990.00 | root | | |
| ├─IndexFullScan_27(Build) | 9990.00 | cop[tikv] | table:t1, index:idx_a(a) | keep order:false, stats:pseudo |
| └─TableRowIDScan_28(Probe) | 9990.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+----------------------------------+----------+-----------+--------------------------+------------------------------------------------------------------+
7 rows in set (0.00 sec)
To complete the HashJoin_22
operation, you need to execute TableReader_26(Build)
, and then execute IndexLookUp_29(Probe)
.
When executing IndexLookUp_29(Probe)
, you need to execute IndexFullScan_27(Build)
and TableRowIDScan_28(Probe)
one by one. Therefore, from the perspective of the whole execution link, TableRowIDScan_28(Probe)
is the last one awoken to be executed.
Currently, calculation tasks of TiDB can be divided into two categories: cop tasks and root tasks. The cop tasks are performed by the Coprocessor in TiKV, and the root tasks are performed in TiDB.
One of the goals of SQL optimization is to push the calculation down to TiKV as much as possible. The Coprocessor in TiKV supports most of the built-in SQL functions (including the aggregate functions and the scalar functions), SQL LIMIT
operations, index scans, and table scans. However, all Join
operations can only be performed as root tasks in TiDB.
Access Object is the data item accessed by the operator, including table
, partition
, and index
(if any). Only operators that directly access the data have this information.
In the WHERE
/HAVING
/ON
conditions, the TiDB optimizer analyzes the result returned by the primary key query or the index key query. For example, these conditions might include comparison operators of the numeric and date type, such as >
, <
, =
, >=
, <=
, and the character type such as LIKE
.
Note:
- Currently, TiDB only supports the case where a comparison operator is connected by a column (at one end) and a constant value (at the other end), or the case that the calculation result is a constant. You cannot use query conditions like
year(birth_day) < 1992
as the index.- It is recommended to compare data of the same type; otherwise, additional
cast
operations are introduced, which causes the index to be unavailable. For example, regarding the conditionuser_id = 123456
, ifuser_id
is a string, you must define123456
as a string constant.
You can also use AND
(intersection) and OR
(union) to combine the range query conditions of one column. For a multi-dimensional composite index, you can use conditions in multiple columns. For example, regarding the composite index (a, b, c)
:
- When
a
is an equivalent query, continue to figure out the query range ofb
; whenb
is also an equivalent query, continue to figure out the query range ofc
. - Otherwise, if
a
is a non-equivalent query, you can only figure out the range ofa
.
Different operators output different information after the EXPLAIN
statement is executed. This section focuses on the execution plan of different operators, ranging from table scans, table aggregation, to table join.
You can use optimizer hints to control the behavior of the optimizer, and thereby controlling the selection of the physical operators. For example, /*+ HASH_JOIN(t1, t2) */
means that the optimizer uses the Hash Join
algorithm. For more details, see Optimizer Hints.
The operators that perform table scans (of the disk or the TiKV Block Cache) are listed as follows:
- TableFullScan: Full table scan.
- TableRangeScan: Table scans with the specified range.
- TableRowIDScan: Accurately scans the table data based on the RowID passed down from the upper layer.
- IndexFullScan: Another type of "full table scan", except that the index data is scanned, rather than the table data.
- IndexRangeScan: Index scans with the specified range.
TiDB aggregates the data or calculation results scanned from TiKV/TiFlash. The data aggregation operators can be divided into the following categories:
- TableReader: Aggregates the data obtained by the underlying operators like
TableFullScan
orTableRangeScan
in TiKV. - IndexReader: Aggregates the data obtained by the underlying operators like
IndexFullScan
orIndexRangeScan
in TiKV. - IndexLookUp: First aggregates the RowID (in TiKV) scanned by the
Build
side. Then at theProbe
side, accurately reads the data from TiKV based on these RowIDs. At theBuild
side, there are operators likeIndexFullScan
orIndexRangeScan
; at theProbe
side, there is theTableRowIDScan
operator. - IndexMerge: Similar to
IndexLookUp
.IndexMerge
can be seen as an extension ofIndexLookupReader
.IndexMerge
supports reading multiple indexes at the same time. There are manyBuild
s and oneProbe
. The execution process ofIndexMerge
the same as that ofIndexLookUp
.
Table data refers to the original data of a table stored in TiKV. For each row of the table data, its key
is a 64-bit integer called RowID
. If a primary key of a table is an integer, TiDB uses the value of the primary key as the RowID
of the table data; otherwise, the system automatically generates RowID
. The value
of the table data is encoded from all the data in this row. When reading table data, data is returned in the order of increasing RowID
s.
Similar to the table data, the index data is also stored in TiKV. Its key
is the ordered bytes encoded from the index column. value
is the RowID
corresponding to a row of index data. The non-index column of the row can be read using RowID
. When reading index data, TiKV returns data in ascending order of index columns. If there are multiple index columns, firstly, ensure that the first column is incremented; if the i-th column is equivalent, make sure that the (i+1)-th column is incremented.
{{< copyable "sql" >}}
explain select * from t use index(idx_a);
+-------------------------------+----------+-----------+-------------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+----------+-----------+-------------------------+--------------------------------+
| IndexLookUp_6 | 10000.00 | root | | |
| ├─IndexFullScan_4(Build) | 10000.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:false, stats:pseudo |
| └─TableRowIDScan_5(Probe) | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------------+----------+-----------+-------------------------+--------------------------------+
3 rows in set (0.00 sec)
The IndexLookUp_6
operator has two child nodes: IndexFullScan_4(Build)
and TableRowIDScan_5(Probe)
.
IndexFullScan_4(Build)
performs an index full scan and scans all the data of indexa
. Because it is a full scan, this operation gets theRowID
of all the data in the table.TableRowIDScan_5(Probe)
scans all table data usingRowID
s.
This execution plan is not as efficient as using TableReader
to perform a full table scan, because IndexLookUp
performs an extra index scan (which comes with additional overhead), apart from the table scan.
For table scan operations, the operator info column in the explain
table shows whether the data is sorted. In the above example, the keep order:false
in the IndexFullScan
operator indicates that the data is unsorted. The stats:pseudo
in the operator info means that there is no statistics, or that the statistics will not be used for estimation because it is outdated. For other scan operations, the operator info involves similar information.
{{< copyable "sql" >}}
explain select * from t where a > 1 or b >100;
+-------------------------+----------+-----------+---------------+----------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+---------------+----------------------------------------+
| TableReader_7 | 8000.00 | root | | data:Selection_6 |
| └─Selection_6 | 8000.00 | cop[tikv] | | or(gt(test.t.a, 1), gt(test.t.b, 100)) |
| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+----------------------------------------+
3 rows in set (0.00 sec)
In the above example, the child node of the TableReader_7
operator is Selection_6
. The subtree rooted at this child node is seen as a Cop Task
and is delivered to the corresponding TiKV. This Cop Task
uses the TableFullScan_5
operator to perform the table scan. Selection
represents the selection condition in the SQL statement, namely, the WHERE
/HAVING
/ON
clause.
TableFullScan_5
performs a full table scan, and the load on the cluster increases accordingly, which might affect other queries running in the cluster. If an appropriate index is built and the IndexMerge
operator is used, these will greatly improve query performance and reduce load on the cluster.
IndexMerge
is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans.
mysql> explain select * from t where a = 1 or b = 1;
+-------------------------+----------+-----------+---------------+--------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+---------------+--------------------------------------+
| TableReader_7 | 8000.00 | root | | data:Selection_6 |
| └─Selection_6 | 8000.00 | cop[tikv] | | or(eq(test.t.a, 1), eq(test.t.b, 1)) |
| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+--------------------------------------+
mysql> set @@tidb_enable_index_merge = 1;
mysql> explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1;
+--------------------------------+---------+-----------+-------------------------+------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+-------------------------+------------------------------------------------+
| IndexMerge_16 | 6666.67 | root | | |
| ├─IndexRangeScan_13(Build) | 3333.33 | cop[tikv] | table:t, index:idx_a(a) | range:(1,+inf], keep order:false, stats:pseudo |
| ├─IndexRangeScan_14(Build) | 3333.33 | cop[tikv] | table:t, index:idx_b(b) | range:(1,+inf], keep order:false, stats:pseudo |
| └─TableRowIDScan_15(Probe) | 6666.67 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+--------------------------------+---------+-----------+-------------------------+------------------------------------------------+
In the above query, the filter condition is a WHERE
clause that uses OR
as the connector. Without IndexMerge
, you can use only one index per table. a = 1
cannot be pushed down to the index a
; neither can b = 1
be pushed down to the index b
. The full table scan is inefficient when a huge volume of data exists in t
. To handle such a scenario, IndexMerge
is introduced in TiDB to access tables.
IndexMerge
allows the optimizer to use multiple indexes per table, and merge the results returned by each index to generate the execution plan of the latter IndexMerge
in the figure above. Here the IndexMerge_16
operator has three child nodes, among which IndexRangeScan_13
and IndexRangeScan_14
get all the RowID
s that meet the conditions based on the result of range scan, and then the TableRowIDScan_15
operator accurately reads all the data that meets the conditions according to these RowID
s.
For the scan operation that is performed on a specific range of data, such as IndexRangeScan
/TableRangeScan
, the operator info
column in the result has additional information about the scan range compared with other scan operations like IndexFullScan
/TableFullScan
. In the above example, the range:(1,+inf]
in the IndexRangeScan_13
operator indicates that the operator scans the data from 1 to positive infinity.
Note:
At present, the
IndexMerge
feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios ofIndexMerge
in TiDB 4.0 are limited to the disjunctive normal form (expressions connected byor
). The conjunctive normal form (expressions connected byand
) will be supported in later versions. Enable theIndexMerge
in one of two ways:
Set the
tidb_enable_index_merge
system variable to 1;Use the SQL Hint
USE_INDEX_MERGE
in the query.SQL Hint has a higher priority than system variables.
Aggregation algorithms in TiDB include the following categories:
The Hash Aggregation
operator is optimized in multi-threaded concurrency. It is quick to execute at the cost of more memory usage. The following is an example of Hash Aggregate
:
{{< copyable "sql" >}}
explain select /*+ HASH_AGG() */ count(*) from t;
+---------------------------+----------+-----------+---------------+---------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+----------+-----------+---------------+---------------------------------+
| HashAgg_11 | 1.00 | root | | funcs:count(Column#7)->Column#4 |
| └─TableReader_12 | 1.00 | root | | data:HashAgg_5 |
| └─HashAgg_5 | 1.00 | cop[tikv] | | funcs:count(1)->Column#7 |
| └─TableFullScan_8 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+---------------------------+----------+-----------+---------------+---------------------------------+
4 rows in set (0.00 sec)
Generally speaking, Hash Aggregate
is executed in two stages.
- One is on the Coprocessor of TiKV/TiFlash, with the intermediate results of the aggregation function calculated when the table scan operator reads the data.
- The other is at the TiDB layer, with the final result calculated through aggregating the intermediate results of all Coprocessor Tasks.
The operator info column in the explain
table also records other information about Hash Aggregation
. You need to pay attention to what aggregate function that Hash Aggregation
uses. In the above example, the operator info of the Hash Aggregation
operator is funcs:count(Column#7)->Column#4
. It means that Hash Aggregation
uses the aggregate function count
for calculation. The operator info of the Stream Aggregation
operator in the following example is the same with this one.
The Stream Aggregation
operator usually takes up less memory than Hash Aggregate
. In some scenarios, Stream Aggregation
executes faster than Hash Aggregate
. In the case of a large amount of data or insufficient system memory, it is recommended to use the Stream Aggregate
operator. An example is as follows:
{{< copyable "sql" >}}
explain select /*+ STREAM_AGG() */ count(*) from t;
+----------------------------+----------+-----------+---------------+---------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+----------+-----------+---------------+---------------------------------+
| StreamAgg_16 | 1.00 | root | | funcs:count(Column#7)->Column#4 |
| └─TableReader_17 | 1.00 | root | | data:StreamAgg_8 |
| └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:count(1)->Column#7 |
| └─TableFullScan_13 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------+----------+-----------+---------------+---------------------------------+
4 rows in set (0.00 sec)
Similar to Hash Aggregate
, Stream Aggregate
is executed in two stages.
- One is on the Coprocessor of TiKV/TiFlash, with the intermediate results of the aggregation function calculated when the table scan operator reads the data.
- The other is at the TiDB layer, with the final result calculated through aggregating the intermediate results of all Coprocessor Tasks.
The Join
algorithms in TiDB consist of the following categories:
- Hash Join
- Merge Join
- Index Join (Index Nested Loop Join)
- Index Hash Join (Index Nested Loop Hash Join)
- Index Merge Join (Index Nested Loop Merge Join)
The following are examples of the execution processes of these Join
algorithms.
The Hash Join
operator uses multi-thread. Its execution speed is fast at the cost of more memory usage. An example of Hash Join
is as follows:
{{< copyable "sql" >}}
EXPLAIN SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+----------+-----------+------------------------+------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+------------------------+------------------------------------------------+
| HashJoin_30 | 12487.50 | root | | inner join, equal:[eq(test.t1.id, test.t2.id)] |
| ├─IndexReader_35(Build) | 9990.00 | root | | index:IndexFullScan_34 |
| │ └─IndexFullScan_34 | 9990.00 | cop[tikv] | table:t2, index:id(id) | keep order:false, stats:pseudo |
| └─IndexReader_33(Probe) | 9990.00 | root | | index:IndexFullScan_32 |
| └─IndexFullScan_32 | 9990.00 | cop[tikv] | table:t1, index:id(id) | keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+------------------------+------------------------------------------------+
5 rows in set (0.01 sec)
The execution process of Hash Join
is as follows:
- Cache the data of the
Build
side in memory. - Construct a Hash Table on the
Build
side based on the cached data. - Read the data at the
Probe
side. - Use the data of the
Probe
side to probe the Hash Table. - Return qualified data to the user.
The operator info column in the explain
table also records other information about Hash Join
, including whether the query is Inner Join or Outer Join, and what are the conditions of Join. In the above example, the query is an Inner Join, where the Join condition equal:[eq(test.t1.id, test.t2.id)]
partly corresponds with the query statement where t1.id = t2. id
. The operator info of the other Join operators in the following examples is similar to this one.
The Merge Join
operator usually uses less memory than Hash Join
. However, Merge Join
might take longer to be executed. When the amount of data is large, or the system memory is insufficient, it is recommended to use Merge Join
. The following is an example:
{{< copyable "sql" >}}
EXPLAIN SELECT /*+ MERGE_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------+
| MergeJoin_7 | 12487.50 | root | | inner join, left key:test.t1.id, right key:test.t2.id |
| ├─IndexReader_12(Build) | 9990.00 | root | | index:IndexFullScan_11 |
| │ └─IndexFullScan_11 | 9990.00 | cop[tikv] | table:t2, index:id(id) | keep order:true, stats:pseudo |
| └─IndexReader_10(Probe) | 9990.00 | root | | index:IndexFullScan_9 |
| └─IndexFullScan_9 | 9990.00 | cop[tikv] | table:t1, index:id(id) | keep order:true, stats:pseudo |
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------+
5 rows in set (0.01 sec)
The execution of the Merge Join
operator is as follows:
- Read all the data of a Join Group from the
Build
side into the memory - Read the data of the
Probe
side. - Compare whether each row of data on the
Probe
side matches a complete Join Group on theBuild
side. Apart from equivalent conditions, there are non-equivalent conditions. Here "match" mainly refers to checking whether non-equivalent conditions are met. Join Group refers to the data with the same value among all Join Keys.
If the result set (obtained after the outer tables are filtered by the WHERE
condition) is small, it is recommended to use Index Join
. Here "small" means data is less than 10,000 rows.
{{< copyable "sql" >}}
EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
| IndexJoin_11 | 12487.50 | root | | inner join, inner:IndexReader_10, outer key:test.t1.id, inner key:test.t2.id |
| ├─IndexReader_31(Build) | 9990.00 | root | | index:IndexFullScan_30 |
| │ └─IndexFullScan_30 | 9990.00 | cop[tikv] | table:t1, index:id(id) | keep order:false, stats:pseudo |
| └─IndexReader_10(Probe) | 1.00 | root | | index:Selection_9 |
| └─Selection_9 | 1.00 | cop[tikv] | | not(isnull(test.t2.id)) |
| └─IndexRangeScan_8 | 1.00 | cop[tikv] | table:t2, index:id(id) | range: decided by [eq(test.t2.id, test.t1.id)], keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
Index Hash Join
uses the same conditions as Index Join
. However, Index Hash Join
saves more memory in some scenarios.
{{< copyable "sql" >}}
EXPLAIN SELECT /*+ INL_HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
| IndexHashJoin_18 | 12487.50 | root | | inner join, inner:IndexReader_10, outer key:test.t1.id, inner key:test.t2.id |
| ├─IndexReader_31(Build) | 9990.00 | root | | index:IndexFullScan_30 |
| │ └─IndexFullScan_30 | 9990.00 | cop[tikv] | table:t1, index:id(id) | keep order:false, stats:pseudo |
| └─IndexReader_10(Probe) | 1.00 | root | | index:Selection_9 |
| └─Selection_9 | 1.00 | cop[tikv] | | not(isnull(test.t2.id)) |
| └─IndexRangeScan_8 | 1.00 | cop[tikv] | table:t2, index:id(id) | range: decided by [eq(test.t2.id, test.t1.id)], keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+------------------------+--------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
Index Merge Join
is used in similar scenarios as Index Join. However, the index prefix used by the inner table is the inner table column collection in the join keys. Index Merge Join
saves more memory than INL_JOIN
.
{{< copyable "sql" >}}
EXPLAIN SELECT /*+ INL_MERGE_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------+
| IndexMergeJoin_16 | 12487.50 | root | | inner join, inner:IndexReader_14, outer key:test.t1.id, inner key:test.t2.id |
| ├─IndexReader_31(Build) | 9990.00 | root | | index:IndexFullScan_30 |
| │ └─IndexFullScan_30 | 9990.00 | cop[tikv] | table:t1, index:id(id) | keep order:false, stats:pseudo |
| └─IndexReader_14(Probe) | 1.00 | root | | index:Selection_13 |
| └─Selection_13 | 1.00 | cop[tikv] | | not(isnull(test.t2.id)) |
| └─IndexRangeScan_12 | 1.00 | cop[tikv] | table:t2, index:id(id) | range: decided by [eq(test.t2.id, test.t1.id)], keep order:true, stats:pseudo |
+-----------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
For more details, refer to bikeshare example database.
{{< copyable "sql" >}}
EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';
+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_20 | 1.00 | root | | funcs:count(Column#13)->Column#11 |
| └─TableReader_21 | 1.00 | root | | data:StreamAgg_9 |
| └─StreamAgg_9 | 1.00 | cop[tikv] | | funcs:count(1)->Column#13 |
| └─Selection_19 | 8166.73 | cop[tikv] | | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) |
| └─TableFullScan_18 | 19117643.00 | cop[tikv] | table:trips | keep order:false, stats:pseudo |
+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+
The execution process of the above example can be illustrated as follows:
- Coprocessor reads the data on the trips table (executed by
TableScan_18
). - Find data that meets the
start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'
condition (executed bySelection_19
). - Calculate the number of rows that satisfy the condition, and return the result to TiDB (executed by
StreamAgg_9
). - TiDB aggregates the results returned by each Coprocessor (executed by
TableReader_21
). - TiDB calculates the number of rows of all data (
StreamAgg_20
), and finally returns the results to the client.
In the above query, TiDB estimates the number of rows in the output of TableScan_18
as 19117643.00, based on the statistics of the trips
table. The number of rows that meet the start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'
condition is 8168.73. After the aggregation operation, there is only 1 result.
The execution as illustrated in the above example is not efficient enough, though most of the calculation logic is pushed down to the TiKV Coprocessor. You can add an appropriate index to eliminate the full table scan on trips
by TableScan_18
, thereby accelerating the execution of the query:
{{< copyable "sql" >}}
ALTER TABLE trips ADD INDEX (start_date);
{{< copyable "sql" >}}
EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';
+-----------------------------+---------+-----------+-------------------------------------------+---------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+-------------------------------------------+---------------------------------------------------------------------------------+
| StreamAgg_17 | 1.00 | root | | funcs:count(Column#13)->Column#11 |
| └─IndexReader_18 | 1.00 | root | | index:StreamAgg_9 |
| └─StreamAgg_9 | 1.00 | cop[tikv] | | funcs:count(1)->Column#13 |
| └─IndexRangeScan_16 | 8166.73 | cop[tikv] | table:trips, index:start_date(start_date) | range:[2017-07-01 00:00:00,2017-07-01 23:59:59], keep order:false, stats:pseudo |
+-----------------------------+---------+-----------+-------------------------------------------+---------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
After adding the index, use IndexScan_24
to directly read the data that meets the start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'
condition. The estimated number of rows to be scanned decreases from 19117643.00 to 8166.73. In the test environment, the execution time of this query decreases from 50.41 seconds to 0.01 seconds.
Based on MySQL, TiDB defines some special system variables and syntax to optimize performance. Some system variables are related to specific operators, such as the concurrency of the operator, the upper limit of the operator memory, and whether to use partitioned tables. These can be controlled by system variables, thereby affecting the efficiency of each operator.