Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some read requests do not honor the store labels #45693

Open
zyguan opened this issue Jul 31, 2023 · 2 comments
Open

Some read requests do not honor the store labels #45693

zyguan opened this issue Jul 31, 2023 · 2 comments

Comments

@zyguan
Copy link
Contributor

zyguan commented Jul 31, 2023

Bug Report

TiDB only set MatchStoreLabels on executorBuilder.getSnapshot, however some of executors may use txn.{Get,BatchGet} directly. These reads do not honor the store labels thus may access replicas on the slow tikv.

1. Minimal reproduce step (Required)

Deploy a cluster with 3 AZs and evict-slow-store-scheduler, run a stale read workload, and then slowdown a tikv in one AZ.

2. What did you expect to see? (Required)

The other two AZs should not be affected after leaders have been evicted.

3. What did you see instead (Required)

All AZs had perf regression during the fault injection.

4. What is your TiDB version? (Required)

release-6.5

@zyguan zyguan added the type/bug The issue is confirmed as a bug. label Jul 31, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Aug 1, 2023
@MyonKeminta
Copy link
Contributor

There are many calls to txn.{Get,BatchGet} that spreads everywhere. Including:

  1. DML / select for update executors, for example:
    if _, err := txn.Get(ctx, r.handleKey.newKey); err == nil {
  2. Temp table / temp index, for example:
    _, err := txn.Get(context.Background(), rowKey)
  3. Write operations, for example:
    value, err = txn.Get(ctx, key)
  4. Foreign key checks, for example:
    _, err := txn.Get(ctx, k)

It seems that they are all in write operations. Those readonly operations doesn't seem to be affected.


To fix the problem, we have multiple choices:

  1. The first solution we considered is to disable replica read for read operations in write statements. However, since the current implementation has been released in several versions, changing the behavior may affect users that already benefits from replica read in write statements. We need to confirm the impact.
  2. Make all those reads affected by replica read and MatchStoreLabels. This makes all these read able to benefit from replica read, however it makes the write path more complicated, increasing the difficulty to diagnose performance issues. Also, we used to meet such issue that replica read in write statements actually slows the statement down when there are faults in the cluster. As the related code spreads everywhere as we listed above, the difficulty to change the code is also high.
  3. Add a variable/configuration to control whether replica read in write statements are allowed. This is most complicated to implementation, and we need PMs' opinion about whether adding this configuration is acceptable.

@crazycs520
Copy link
Contributor

Configurable KV Timeout also has this issue too. Such as tidb_kv_read_timeout variable doesn't take effect on insert ignore statement when doing unique check.

@cfzjywxk cfzjywxk added affects-6.5 affects-7.1 and removed may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants