title | summary | aliases | ||
---|---|---|---|---|
EXPLAIN ANALYZE | TiDB SQL Statement Reference |
An overview of the usage of EXPLAIN ANALYZE for the TiDB database. |
|
The EXPLAIN ANALYZE
statement works similar to EXPLAIN
, with the major difference being that it will actually execute the statement. This allows you to compare the estimates used as part of query planning to actual values encountered during execution. If the estimates differ significantly from the actual values, you should consider running ANALYZE TABLE
on the affected tables.
Note:
When you use
EXPLAIN ANALYZE
to execute DML statements, modification to data is normally executed. Currently, the execution plan for DML statements cannot be shown yet.
ExplainSym ::=
'EXPLAIN'
| 'DESCRIBE'
| 'DESC'
ExplainStmt ::=
ExplainSym ( TableName ColumnName? | 'ANALYZE'? ExplainableStmt | 'FOR' 'CONNECTION' NUM | 'FORMAT' '=' ( stringLit | ExplainFormatType ) ( 'FOR' 'CONNECTION' NUM | ExplainableStmt ) )
ExplainableStmt ::=
SelectStmt
| DeleteFromStmt
| UpdateStmt
| InsertIntoStmt
| ReplaceIntoStmt
| UnionStmt
Different from EXPLAIN
, EXPLAIN ANALYZE
executes the corresponding SQL statement, records its runtime information, and returns the information together with the execution plan. Therefore, you can regard EXPLAIN ANALYZE
as an extension of the EXPLAIN
statement. Compared to EXPLAIN
(for debugging query execution), the return results of EXPLAIN ANALYZE
also include columns of information such as actRows
, execution info
, memory
, and disk
. The details of these columns are shown as follows:
attribute name | description |
---|---|
actRows |
Number of rows output by the operator. |
execution info |
Execution information of the operator. time represents the total wall time from entering the operator to leaving the operator, including the total execution time of all sub-operators. If the operator is called multiple times by the parent operator (in loops), then the time refers to the accumulated time. loops is the number of times the current operator is called by the parent operator. open represents the time spent initializing the operator. close refers to the time taken from when the operator finishes processing data to when it ends execution. The time value includes both open and close time. When the operator is executed concurrently, execution info shows the sum of all used wall time . In this case, time , open , and close are replaced with total_time , total_open , and total_close . |
memory |
Maximum memory space occupied by the operator. |
disk |
Maximum disk space occupied by the operator. |
{{< copyable "sql" >}}
CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, c1 INT NOT NULL);
Query OK, 0 rows affected (0.12 sec)
{{< copyable "sql" >}}
INSERT INTO t1 (c1) VALUES (1), (2), (3);
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0
{{< copyable "sql" >}}
EXPLAIN ANALYZE SELECT * FROM t1 WHERE id = 1;
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
| Point_Get_1 | 1.00 | 1 | root | table:t1 | time:757.205µs, loops:2, Get:{num_rpc:1, total_time:697.051µs} | handle:1 | N/A | N/A |
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
1 row in set (0.01 sec)
{{< copyable "sql" >}}
EXPLAIN ANALYZE SELECT * FROM t1;
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
| TableReader_5 | 10000.00 | 3 | root | | time:278.2µs, loops:2, cop_task: {num: 1, max: 437.6µs, proc_keys: 3, rpc_num: 1, rpc_time: 423.9µs, copr_cache_hit_ratio: 0.00} | data:TableFullScan_4 | 251 Bytes | N/A |
| └─TableFullScan_4 | 10000.00 | 3 | cop[tikv] | table:t1 | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 3, total_process_keys_size: 111, total_keys: 4, rocksdb: {delete_skipped_count: 0, key_skipped_count: 3, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}} | keep order:false, stats:pseudo | N/A | N/A |
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
2 rows in set (0.00 sec)
In addition to the basic time
, open
, close
and loop
execution information, execution info
also contains operator-specific execution information, which mainly includes the time consumed for the operator to send RPC requests and the duration of other steps.
The execution information from a Point_Get
operator will typically contain the following information:
Get:{num_rpc:1, total_time:697.051µs}
: The number of theGet
RPC requests (num_rpc
) sent to TiKV and the total duration (total_time
) of all RPC requests.ResolveLock:{num_rpc:1, total_time:12.117495ms}
: If TiDB encounters a lock when reading data, it has to resolve the lock first, which generally occurs in the scenario of read-write conflict. This information indicates the duration of resolving locks.regionMiss_backoff:{num:11, total_time:2010 ms},tikvRPC_backoff:{num:11, total_time:10691 ms}
: When an RPC request fails, TiDB will wait the backoff time before retrying the request. Backoff statistics include the type of backoff (such asregionMiss
andtikvRPC
), the total waiting time (total_time
), and the total number of backoffs (num
).
The execution information of the Batch_Point_Get
operator is similar to that of the Point_Get
operator, but Batch_Point_Get
generally sends BatchGet
RPC requests to TiKV to read data.
BatchGet:{num_rpc:2, total_time:83.13µs}
: The number of RPC requests (num_rpc
) of the BatchGet
type sent to TiKV and the total time consumed (total_time
) for all RPC requests.
The execution information of a TableReader
operator is typically as follows:
cop_task: {num: 6, max: 1.07587ms, min: 844.312µs, avg: 919.601µs, p95: 1.07587ms, max_proc_keys: 16, p95_proc_keys: 16, tot_proc: 1ms, tot_wait: 1ms, rpc_num: 6, rpc_time: 5.313996 ms, copr_cache_hit_ratio: 0.00}
cop_task
: Contains the execution information ofcop
tasks. For example:num
: The number of cop tasks.max
,min
,avg
,p95
: The maximum, minimum, average, and P95 values of the execution time consumed for executing cop tasks.max_proc_keys
andp95_proc_keys
: The maximum and P95 key-values scanned by TiKV in all cop tasks. If the difference between the maximum value and the P95 value is large, the data distribution might be imbalanced.rpc_num
,rpc_time
: The total number and total time consumed forCop
RPC requests sent to TiKV.copr_cache_hit_ratio
: The hit rate of Coprocessor Cache forcop
task requests.
backoff
: Contains different types of backoff and the total waiting time of backoff.
The execution information of an Insert
operator is typically as follows:
prepare:109.616µs, check_insert:{total_time:1.431678ms, mem_insert_time:667.878µs, prefetch:763.8µs, rpc:{BatchGet:{num_rpc:1, total_time:699.166µs},Get:{num_rpc:1, total_time:378.276µs }}}
prepare
: The time consumed for preparing to write, including expression, default value and auto-increment value calculations.check_insert
: This information generally appears ininsert ignore
andinsert on duplicate
statements, including conflict checking and the time consumed for writing data to TiDB transaction cache. Note that this time consumption does not include the time consumed for transaction commit. It contains the following information:total_time
: The total time spent on thecheck_insert
step.mem_insert_time
: The time consumed for writing data to the TiDB transaction cache.prefetch
: The duration of retrieving the data that needs to be checked for conflicts from TiKV. This step sends aBatch_Get
RPC request to TiKV to retrieve data.rpc
: The total time consumed for sending RPC requests to TiKV, which generally includes two types of RPC time,BatchGet
andGet
, among which:BatchGet
RPC request is sent in theprefetch
step.Get
RPC request is sent when theinsert on duplicate
statement executesduplicate update
.
backoff
: Contains different types of backoff and the total waiting time of backoff.
The IndexJoin
operator has 1 outer worker and N inner workers for concurrent execution. The join result preserves the order of the outer table. The detailed execution process is as follows:
- The outer worker reads N outer rows, then wraps it into a task, and sends it to the result channel and the inner worker channel.
- The inner worker receives the task, build key ranges from the task, and fetches inner rows according to the key ranges. It then builds the inner row hash table.
- The main
IndexJoin
thread receives the task from the result channel and waits for the inner worker to finish handling the task. - The main
IndexJoin
thread joins each outer row by looking up to the inner rows' hash table.
The IndexJoin
operator contains the following execution information:
inner:{total:4.297515932s, concurrency:5, task:17, construct:97.96291ms, fetch:4.164310088s, build:35.219574ms}, probe:53.574945ms
Inner
: The execution information of inner worker:total
: The total time consumed by the inner worker.concurrency
: The number of concurrent inner workers.task
: The total number of tasks processed by the inner worker.construct
: The preparation time before the inner worker reads the inner table rows corresponding to the task.fetch
: The total time consumed for it takes for the inner worker to read inner table rows.Build
: The total time consumed for it takes for the inner worker to construct the hash table of the corresponding inner table rows.
probe
: The total time consumed by the mainIndexJoin
thread to perform join operations with the hash table of the outer table rows and the inner table rows.
The execution process of the IndexHashJoin
operator is similar to that of the IndexJoin
operator. IndexHashJoin
operator also has 1 outer worker and N inner workers to execute in parallel, but the output order is not guaranteed to be consistent with that of the outer table. The detailed execution process is as follows:
- The outer worker reads N outer rows, builds a task, and sends it to the inner worker channel.
- The inner worker receives the tasks from the inner worker channel and performs the following three operations in order for every task: a. Build a hash table from the outer rows b. Build key ranges from outer rows and fetches inner rows c. Probe the hash table and sends the join result to the result channel. Note: step a and step b are running concurrently.
- The main thread of
IndexHashJoin
receives the join results from the result channel.
The IndexHashJoin
operator contains the following execution information:
inner:{total:4.429220003s, concurrency:5, task:17, construct:96.207725ms, fetch:4.239324006s, build:24.567801ms, join:93.607362ms}
Inner
: the execution information of inner worker:total
: the total time consumed by the inner worker.concurrency
: the number of inner workers.task
: The total number of tasks processed by the inner worker.construct
: The preparation time before the inner worker reads the inner table rows.fetch
: The total time consumed for inner worker to read inner table rows.Build
: The total time consumed for inner worker to construct the hash table of the outer table rows.join
: The total time consumed for inner worker to do join with the inner table rows and the hash table of outer table rows.
The HashJoin
operator has an inner worker, an outer worker, and N join workers. The detailed execution process is as follows:
- The inner worker reads inner table rows and constructs a hash table.
- The outer worker reads the outer table rows, then wraps it into a task and sends it to the join worker.
- The join worker waits for the hash table construction in step 1 to finish.
- The join worker uses the outer table rows and hash table in the task to perform join operations, and then sends the join result to the result channel.
- The main thread of
HashJoin
receives the join result from the result channel.
The HashJoin
operator contains the following execution information:
build_hash_table:{total:146.071334ms, fetch:110.338509ms, build:35.732825ms}, probe:{concurrency:5, total:857.162518ms, max:171.48271ms, probe:125.341665ms, fetch:731.820853ms}
build_hash_table
: Reads the data of the inner table and constructs the execution information of the hash table:total
: The total time consumption.fetch
: The total time spent reading inner table data.build
: The total time spent constructing a hash table.
probe
: The execution information of join workers:concurrency
: The number of join workers.total
: The total time consumed by all join workers.max
: The longest time for a single join worker to execute.probe
: The total time consumed for joining with outer table rows and the hash table.fetch
: The total time that the join worker waits to read the outer table rows data.
The TableFullScan
operator executed on a TiFlash node contains the following execution information:
tiflash_scan: {
dtfile: {
total_scanned_packs: 2,
total_skipped_packs: 1,
total_scanned_rows: 16000,
total_skipped_rows: 8192,
total_rough_set_index_load_time: 2ms,
total_read_time: 20ms
},
total_create_snapshot_time: 1ms
}
dtfile
: the DTFile (DeltaTree File) related information during the table scan, which reflects the data scan status of the TiFlash Stable layer.total_scanned_packs
: the total number of packs that have been scanned in the DTFile. A pack is the minimum unit that can be read in the TiFlash DTFile. By default, every 8192 rows constitute a pack.total_skipped_packs
: the total number of packs that have been skipped by the scan in the DTFile. When aWHERE
clause hits rough set indexes or matches the range filtering of a primary key, the irrelevant packs are skipped.total_scanned_rows
: the total number of rows that have been scanned in the DTFile. If there are multiple versions of updates or deletions because of MVCC, each version is counted independently.total_skipped_rows
: the total number of rows that are skipped by the scan in the DTFile.total_rs_index_load_time
: the total time used to read DTFile rough set indexes.total_read_time
: the total time used to read DTFile data.
total_create_snapshot_time
: the total time used to create snapshots during the table scan.
When a DML statement is executed in a pessimistic transaction, the execution information of the operator might also include the execution information of lock_keys
. For example:
lock_keys: {time:94.096168ms, region:6, keys:8, lock_rpc:274.503214ms, rpc_count:6}
time
: The total duration of executing thelock_keys
operation.region
: The number of Regions involved in executing thelock_keys
operation.keys
: The number ofKey
s that needLock
.lock_rpc
: The total time spent sending an RPC request of theLock
type to TiKV. Because multiple RPC requests can be sent in parallel, the total RPC time consumption might be greater than the total time consumption of thelock_keys
operation.rpc_count
: The total number of RPC requests of theLock
type sent to TiKV.
When a write-type DML statement is executed in a transaction with autocommit=1
, the execution information of the write operator will also include the duration information of the transaction commit. For example:
commit_txn: {prewrite:48.564544ms, wait_prewrite_binlog:47.821579, get_commit_ts:4.277455ms, commit:50.431774ms, region_num:7, write_keys:16, write_byte:536}
prewrite
: The time consumed for theprewrite
phase of the 2PC commit of the transaction.wait_prewrite_binlog:
: The time consumed for waiting to write the prewrite Binlog.get_commit_ts
: The time consumed for getting the transaction commit timestamp.commit
: The time consumed for thecommit
phase during the 2PC commit of the transaction.write_keys
: The totalkeys
written in the transaction.write_byte
: The total bytes ofkey-value
written in the transaction, and the unit is byte.
Request Unit (RU) is a unified abstraction unit of system resources, which is defined in TiDB resource control. The execution info
of the top-level operator shows the overall RU consumption of this particular SQL statement.
RU:273.842670
Note:
This value shows the actual RUs consumed by this execution. The same SQL statement might consume different amounts of RUs each time it is executed due to the effects of caching (for example, coprocessor cache).
You can calculate the RU from the other values in EXPLAIN ANALYZE
, specifically the execution info
column. For example:
'executeInfo':
time:2.55ms,
loops:2,
RU:0.329460,
Get:{
num_rpc:1,
total_time:2.13ms
},
total_process_time: 231.5µs,
total_wait_time: 732.9µs,
tikv_wall_time: 995.8µs,
scan_detail: {
total_process_keys: 1,
total_process_keys_size: 150,
total_keys: 1,
get_snapshot_time: 691.7µs,
rocksdb: {
block: {
cache_hit_count: 2,
read_count: 1,
read_byte: 8.19 KB,
read_time: 10.3µs
}
}
},
The base costs are defined in the tikv/pd
source code and the calculations are performed in the model.go
file.
If you are using TiDB v7.1, the calculation is the sum of BeforeKVRequest()
and AfterKVRequest()
in pd/pd-client/model.go
, that is:
before key/value request is processed:
consumption.RRU += float64(kc.ReadBaseCost) -> kv.ReadBaseCost * rpc_nums
after key/value request is processed:
consumption.RRU += float64(kc.ReadBytesCost) * readBytes -> kc.ReadBytesCost * total_process_keys_size
consumption.RRU += float64(kc.CPUMsCost) * kvCPUMs -> kc.CPUMsCost * total_process_time
For writes and batch gets, the calculation is similar with different base costs.
When a query involves MPP tasks, the execution time is also affected by various tiflash_wait times, for example:
tiflash_wait: {minTSO_wait: 425ms, pipeline_breaker_wait: 133ms, pipeline_queue_wait: 512ms}
minTSO_wait
: records the time spent waiting for an MPP task to be scheduled by the TiFlash MinTSO Scheduler.pipeline_breaker_wait
: when TiFlash uses the Pipeline Execution Model, it records the time taken by the pipeline containing the pipeline breaker operator to wait for all data in the upstream pipeline. Currently, it is only used to display the time taken by the pipeline containing theJoin
operator to wait for all hash table builds to complete.pipeline_queue_wait
: when TiFlash uses the Pipeline Execution Model, it records the waiting time in the CPU Task Thread Pool and IO Task Thread Pool during the execution of the pipeline.
minTSO_wait
: records the time spent waiting for an MPP task to be scheduled by the TiFlash MinTSO Scheduler.pipeline_breaker_wait
: when TiFlash uses the Pipeline Execution Model, it records the time taken by the pipeline containing the pipeline breaker operator to wait for all data in the upstream pipeline. Currently, it is only used to display the time taken by the pipeline containing theJoin
operator to wait for all hash table builds to complete.pipeline_queue_wait
: when TiFlash uses the Pipeline Execution Model, it records the waiting time in the CPU Task Thread Pool and IO Task Thread Pool during the execution of the pipeline.
The Coprocessor operators usually contain two parts of execution time information: cop_task
and tikv_task
. cop_task
is the time recorded by TiDB, and it is from the moment that the request is sent to the server to the moment that the response is received. tikv_task
is the time recorded by TiKV Coprocessor itself. If there is much difference between the two, it might indicate that the time spent waiting for the response is too long, or the time spent on gRPC or network is too long.
EXPLAIN ANALYZE
is a feature of MySQL 8.0, but both the output format and the potential execution plans in TiDB differ substantially from MySQL.