Skip to content

Commit

Permalink
tiproxy: add performance report of traffic capture (#19335) (#19352)
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-chi-bot authored Nov 7, 2024
1 parent c9fc783 commit 96f0e15
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 2 deletions.
33 changes: 33 additions & 0 deletions tiproxy/tiproxy-performance-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The results are as follows:
- The row number of the query result set has a significant impact on the QPS of TiProxy, and the impact is the same as that of HAProxy.
- The performance of TiProxy increases almost linearly with the number of vCPUs. Therefore, increasing the number of vCPUs can effectively improve the QPS upper limit.
- The number of long connections and the frequency of creating short connections have minimal impact on the QPS of TiProxy.
- The higher the CPU usage of TiProxy, the greater the impact of enabling [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on QPS. When the CPU usage of TiProxy is about 70%, enabling traffic capture leads to approximately 3% decrease in average QPS and 7% decrease in minimum QPS. The latter decrease is caused by periodic QPS drops during traffic file compression.

## Test environment

Expand Down Expand Up @@ -312,3 +313,35 @@ sysbench oltp_point_select \
| 100 | 95597 | 0.52 | 0.65 | 330% | 1800% |
| 200 | 94692 | 0.53 | 0.67 | 330% | 1800% |
| 300 | 94102 | 0.53 | 0.68 | 330% | 1900% |

## Traffic capture test

### Test plan

This test aims to evaluate the performance impact of [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on TiProxy. It uses TiProxy v1.3.0 and compares QPS and TiProxy CPU usage with traffic capture enabled and disabled before executing `sysbench` with different concurrency. Due to periodic QPS fluctuations caused by traffic file compression, the test compares both the average and minimum QPS.

Use the following command to perform the test:

```bash
sysbench oltp_read_write \
--threads=$threads \
--time=1200 \
--report-interval=5 \
--rand-type=uniform \
--db-driver=mysql \
--mysql-db=sbtest \
--mysql-host=$host \
--mysql-port=$port \
run --tables=32 --table-size=1000000
```

### Test results

| Connection count | Traffic capture | Avg QPS | Min QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage |
| - |-----| --- | --- |-----------|-------------|-----------------|
| 20 | Disabled | 27653 | 26999 | 14.46 | 16.12 | 140% |
| 20 | Enabled | 27519 | 26922 | 14.53 | 16.41 | 170% |
| 50 | Disabled | 58014 | 56416 | 17.23 | 20.74 | 270% |
| 50 | Enabled | 56211 | 52236 | 17.79 | 21.89 | 280% |
| 100 | Disabled | 85107 | 84369 | 23.48 | 30.26 | 370% |
| 100 | Enabled | 79819 | 69503 | 25.04 | 31.94 | 380% |
44 changes: 42 additions & 2 deletions tiproxy/tiproxy-traffic-replay.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Traffic replay is not suitable for the following scenarios:
> - TiProxy captures traffic on all connections, including existing and newly created ones.
> - In TiProxy primary-secondary mode, connect to the primary TiProxy instance.
> - If TiProxy is configured with a virtual IP, it is recommended to connect to the virtual IP address.
> - The higher the CPU usage of TiProxy, the greater the impact of traffic capture on QPS. To reduce the impact on the production cluster, it is recommended to reserve at least 30% of CPU capacity, which results in an approximately 3% decrease in average QPS. For detailed performance data, see [Traffic capture test](/tiproxy/tiproxy-performance-test.md#traffic-capture-test).
> - TiProxy does not automatically delete previous capture files when capturing traffic again. You need to manually delete them.
For example, the following command connects to the TiProxy instance at `10.0.1.10:3080`, captures traffic for one hour, and saves it to the `/tmp/traffic` directory on the TiProxy instance:

Expand Down Expand Up @@ -76,7 +78,7 @@ Traffic replay is not suitable for the following scenarios:

5. View the replay report.

After replay completion, the report is stored in the `tiproxy_traffic_report` database on the test cluster. This database contains two tables: `fail` and `other_errors`.
After replay completion, the report is stored in the `tiproxy_traffic_replay` database on the test cluster. This database contains two tables: `fail` and `other_errors`.

The `fail` table stores failed SQL statements, with the following fields:

Expand All @@ -89,16 +91,50 @@ Traffic replay is not suitable for the following scenarios:
- `sample_replay_time`: the time when the SQL statement failed during replay. You can use this to view error information in the TiDB log file.
- `count`: the number of times the SQL statement failed.

The following is an example output of the `fail` table:

```sql
SELECT * FROM tiproxy_traffic_replay.fail LIMIT 1\G
```

```
*************************** 1. row ***************************
cmd_type: StmtExecute
digest: 89c5c505772b8b7e8d5d1eb49f4d47ed914daa2663ed24a85f762daa3cdff43c
sample_stmt: INSERT INTO new_order (no_o_id, no_d_id, no_w_id) VALUES (?, ?, ?) params=[3077 6 1]
sample_err_msg: ERROR 1062 (23000): Duplicate entry '1-6-3077' for key 'new_order.PRIMARY'
sample_conn_id: 1356
sample_capture_time: 2024-10-17 12:59:15
sample_replay_time: 2024-10-17 13:05:05
count: 4
```

The `other_errors` table stores unexpected errors, such as network errors or database connection errors, with the following fields:

- `err_type`: the type of error, presented as a brief error message. For example, `i/o timeout`.
- `sample_err_msg`: the complete error message when the error first occurred.
- `sample_replay_time`: the time when the error occurred during replay. You can use this to view error information in the TiDB log file.
- `count`: the number of occurrences for this error.

The following is an example output of the `other_errors` table:

```sql
SELECT * FROM tiproxy_traffic_replay.other_errors LIMIT 1\G
```

```
*************************** 1. row ***************************
err_type: failed to read the connection: EOF
sample_err_msg: this is an error from the backend connection: failed to read the connection: EOF
sample_replay_time: 2024-10-17 12:57:39
count: 1
```

> **Note:**
>
> The table schema of `tiproxy_traffic_report` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_report` in your application or tool development.
> - The table schema of `tiproxy_traffic_replay` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_replay` in your application or tool development.
> - Replay does not guarantee that the transaction execution order between connections exactly matches the capture sequence. This might lead to incorrect error reports.
> - TiProxy does not automatically delete the previous replay report when replaying traffic. You need to manually delete it.

## Test throughput

Expand Down Expand Up @@ -151,3 +187,7 @@ For more information, see [`tiproxyctl traffic cancel`](/tiproxy/tiproxy-command
- TiProxy traffic replay does not support filtering SQL types and DML and DDL statements are replayed. Therefore, you need to restore the cluster data to its pre-replay state before replaying again.
- TiProxy traffic replay does not support testing [Resource Control](/tidb-resource-control.md) and [privilege management](/privilege-management.md) because TiProxy uses the same username to replay traffic.
- TiProxy does not support replaying [`LOAD DATA`](/sql-statements/sql-statement-load-data.md) statements.

## More resources

For more information about the traffic replay of TiProxy, see the [design document](https://github.com/pingcap/tiproxy/blob/main/docs/design/2024-08-27-traffic-replay.md).

0 comments on commit 96f0e15

Please sign in to comment.