Skip to content

Commit

Permalink
Do not merge until v7.1.2 tikv: add the user's guide of stale read an…
Browse files Browse the repository at this point in the history
…d safe ts (#14524) (#14794)
  • Loading branch information
ti-chi-bot authored Oct 25, 2023
1 parent 3b26a50 commit 7bd62eb
Show file tree
Hide file tree
Showing 12 changed files with 287 additions and 1 deletion.
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@
- [Identify Expensive Queries Using Top SQL](/dashboard/top-sql.md)
- [Identify Expensive Queries Using Logs](/identify-expensive-queries.md)
- [Save and Restore the On-Site Information of a Cluster](/sql-plan-replayer.md)
- [Understanding Stale Read and safe-ts in TiKV](/troubleshoot-stale-read.md)
- [Support Resources](/support.md)
- Performance Tuning
- Tuning Guide
Expand Down
13 changes: 13 additions & 0 deletions grafana-tidb-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,22 @@ To understand the key metrics displayed on the TiDB dashboard, check the followi

### KV Request

The following metrics relate to requests sent to TiKV. Retry requests are counted multiple times.

- KV Request OPS: the execution times of a KV request, displayed according to TiKV
- KV Request Duration 99 by store: the execution time of a KV request, displayed according to TiKV
- KV Request Duration 99 by type: the execution time of a KV request, displayed according to the request type
- Stale Read Hit/Miss Ops
- **hit**: the number of requests per second that successfully execute a stale read
- **miss**: the number of requests per second that attempt a stale read but fail
- Stale Read Req Ops:
- **cross-zone**: the number of requests per second that attempt a stale read in a remote zone
- **local**: the number of requests per second that attempt a stale read in the local zone
- Stale Read Req Traffic:
- **cross-zone-in**: the incoming traffic of responses to requests that attempt a stale read in a remote zone
- **cross-zone-out**: the outgoing traffic of requests that attempt a stale read in a remote zone
- **local-in**: the incoming traffic of responses to requests that attempt a stale read in the local zone
- **local-out**: the outgoing traffic of requests that attempt a stale read in the local zone

### PD Client

Expand Down
14 changes: 14 additions & 0 deletions grafana-tikv-dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,20 @@ This section provides a detailed description of these key metrics on the **TiKV-
- Total pessimistic locks memory size: The memory size occupied by the in-memory pessimistic locks
- In-memory pessimistic locking result: The result of only saving pessimistic locks to memory. `full` means the number of times that the pessimistic lock is not saved to memory because the memory limit is exceeded.

### Resolved-TS

- Resolved-TS worker CPU: The CPU utilization of the resolved-ts worker threads
- Advance-TS worker CPU: The CPU utilization of the advance-ts worker threads
- Scan lock worker CPU: The CPU utilization of the scan lock worker threads
- Max gap of resolved-ts: The maximum time difference between the resolved-ts of all active Regions in this TiKV and the current time
- Max gap of safe-ts: The maximum time difference between the safe-ts of all active Regions in this TiKV and the current time
- Min Resolved TS Region: The ID of the Region whose resolved-ts is the minimal
- Min Safe TS Region: The ID of the Region whose safe-ts is the minimal
- Check Leader Duration: The distribution of time spent on processing leader requests. The duration is from sending requests to receiving responses in leader
- Max gap of resolved-ts in Region leaders: The maximum time difference between the resolved-ts of all active Regions in this TiKV and the current time, only for Region leaders
- Min Leader Resolved TS Region: The ID of the Region whose resolved-ts is the minimal, only for Region leaders
- Lock heap size: The memory footprint of the heap that tracks locks in the resolved-ts module

### Memory

- Allocator Stats: The statistics of the memory allocator
Expand Down
Binary file added media/stale-read/example-ops.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/stale-read/example-ts-gap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/stale-read/metrics-hit-miss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/stale-read/metrics-ops.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added media/stale-read/traffic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion releases/release-6.5.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with-
+ TiKV

- Use gzip compression for `check_leader` requests to reduce traffic [#14553](https://github.com/tikv/tikv/issues/14553) @[you06](https://github.com/you06)
- Add the `Max gap of safe-ts` and `Min safe ts region` metrics and introduce the `tikv-ctl get_region_read_progress` command to better observe and diagnose the status of resolved-ts and safe-ts [#15082](https://github.com/tikv/tikv/issues/15082) @[ekexium](https://github.com/ekexium)
- Add the `Max gap of safe-ts` and `Min safe ts region` metrics and introduce the `tikv-ctl get-region-read-progress` command to better observe and diagnose the status of resolved-ts and safe-ts [#15082](https://github.com/tikv/tikv/issues/15082) @[ekexium](https://github.com/ekexium)
- Expose some RocksDB configurations in TiKV that allow users to disable features such as TTL and periodic compaction [#14873](https://github.com/tikv/tikv/issues/14873) @[LykxSassinator](https://github.com/LykxSassinator)
- Avoid holding mutex when writing Titan manifest files to prevent affecting other threads [#15351](https://github.com/tikv/tikv/issues/15351) @[Connor1996](https://github.com/Connor1996)

Expand Down
6 changes: 6 additions & 0 deletions stale-read.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ advance-ts-interval = "20s" # The default value is "20s". You can set it to a sm
>
> Decreasing the preceding TiKV configuration item will lead to an increase in TiKV CPU usage and traffic between nodes.
<CustomContent platform="tidb">

For more information about the internals of Resolved TS and diagnostic techniques, see [Understanding Stale Read and safe-ts in TiKV](/troubleshoot-stale-read.md).

</CustomContent>

## Restrictions

When a Stale Read query for a table is pushed down to TiFlash, the query will return an error if this table has newer DDL operations executed after the read timestamp specified by the query. This is because TiFlash only supports reading data from the tables with the latest schemas.
Expand Down
37 changes: 37 additions & 0 deletions tikv-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -652,3 +652,40 @@ From the output above, you can see that the information of the damaged SST file
+ In the `sst meta` part, `14` means the SST file number; `552997` means the file size, followed by the smallest and largest sequence numbers and other meta-information.
+ The `overlap region` part shows the information of the Region involved. This information is obtained through the PD server.
+ The `suggested operations` part provides you suggestion to clean up the damaged SST file. You can take the suggestion to clean up files and restart the TiKV instance.
### Get the state of a Region's `RegionReadProgress`
Starting from v6.5.4, v7.1.2, and v7.3.0, TiKV introduces the `get-region-read-progress` subcommand to get up-to-date details of the resolver and `RegionReadProgress`. You need to specify a Region ID and a TiKV, which can be obtained from Grafana (`Min Resolved TS Region` and `Min Safe TS Region`) or `DataIsNotReady` logs.
- `--log` (optional): If specified, TiKV logs the smallest `start_ts` of locks in the Region's resolver in this TiKV at `INFO` level. This option helps you identify locks that might block resolved-ts in advance.
- `--min-start-ts` (optional): If specified, TiKV filters out locks with smaller `start_ts` than this value in logs. You can use this to specify a transaction of interest for logging. It defaults to `0`, which means no filter.
The following is an example:
```
./tikv-ctl --host 127.0.0.1:20160 get-region-read-progress -r 14 --log --min-start-ts 0
```
The output is as follows:
```
Region read progress:
exist: true,
safe_ts: 0,
applied_index: 92,
pending front item (oldest) ts: 0,
pending front item (oldest) applied index: 0,
pending back item (latest) ts: 0,
pending back item (latest) applied index: 0,
paused: false,
Resolver:
exist: true,
resolved_ts: 0,
tracked index: 92,
number of locks: 0,
number of transactions: 0,
stopped: false,
```
The subcommand is useful in diagnosing issues related to Stale Read and safe-ts. For details, see [Understanding Stale Read and safe-ts in TiKV](/troubleshoot-stale-read.md).
Loading

0 comments on commit 7bd62eb

Please sign in to comment.