Find a self-adaptive way to control the rate limit in the 2PC writing #15794
Labels
sig/transaction
SIG:Transaction
type/enhancement
The issue or PR belongs to an enhancement.
type/performance
Development Task
Problem description:
In the prewrite and commit phase, the key-value mutations are split into batches and send to TiKV.
The requests are sent concurrently for each region.
The peak throughput of the TiKV server is limited, so we have to set a rate limit.
There is a TODO here,
During the test, I find the value of this rate limit is crucial.
If it's too large, TiKV would report
Service is busy
, when this happens it could make the whole cluster unavailable. When TiDB getService is busy
from TiKV, it drops region cache, then those metrics become abnormal:There is a lot of error log in the TiDB server like this:
Finally, the schema loading process may fail, then every read request fails.
The text was updated successfully, but these errors were encountered: