You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In TiDB, there is a region cache to cache the region info(region id, region replicas). When TiDB try to access the region, it will first find the region in region cache, so TiDB can get region information without access PD. A region in region cache will become invalid if
The cached information is wrong, for example, TiDB encouter region error when using the cached region information
The cached information is not used for 10 minutes
If a region is invalid in region cache, TiDB will get the latest region information from PD, and after it gets information from PD, it will do some pre-process to convert the raw region information to the region information used in TiDB, one of the pre-process is filterUnavailablePeers, that is for each replica of the region, if the related store is in Down state, the replica will be fiter out, so TiDB does not see these replicas which the store are down.
Problem
Consider a TiDB with 2 TiFlash nodes, and a TiDB table with 2 TiFlash replicas, When query from that table using TiFlash, TiDB will balance the accessed region among the 2 TiFlash nodes using balanceBatchCopTask, the basic idea in balanceBatchCopTask is first find out all the avaliable region replicas, then try to distribute the region replica to all avliable TiFlash nodes evenly. So if the cluster has multiple TiFlash nodes, it is expected that the query load is always balanced between all TiFlash nodes.
Now assuming that a TiFlash node is down for a while, the cached region information will become invalid since TiDB found some replica become unreachable, it will try to reload the region, as described in the Background session, when reload region from PD, TiDB will filter out the replicas in the down store, so after reload the region, TiDB will see a region only containing 1 TiFlash replica. It works fine so far because the table has 2 TiFlash replicas, TiDB will only access the replica in the alive TiFlash node.
However, after the down TiFlash comes back, TiDB still can't see the replica in that TiFlash unless it triggers another region reload. As metioned in Background session, TiDB will reload the region if it found problem when accessing the region or the cached region is not used for 10 minutes. That is to say, if the region can always be accessable(For AP query, it is possible that a table does not have too many write, so the region keeps unchanged for a long time), and TiDB keeps querying the table, TiDB will have no chance to reload the region, which makes TiFlash load very unbalanced(All the query tasks are still sending to one TiFlash). This is not the expected behavior because now the cluster has two TiFlash nodes, the query load should be balanced between the TiFlash nodes.
The text was updated successfully, but these errors were encountered:
windtalker
changed the title
TiFlash replica maybe invisible to TiDB if the TiFlash node was once in down state
TiFlash replica maybe invisible to TiDB if the TiFlash node was once in down state, and make load between TiFlash nodes not balanced
Jun 16, 2022
windtalker
changed the title
TiFlash replica maybe invisible to TiDB if the TiFlash node was once in down state, and make load between TiFlash nodes not balanced
TiFlash replica maybe invisible to TiDB if the TiFlash node was once in down state, and makes load between TiFlash nodes not balanced
Jun 16, 2022
Enhancement
Background
In TiDB, there is a region cache to cache the region info(region id, region replicas). When TiDB try to access the region, it will first find the region in region cache, so TiDB can get region information without access PD. A region in region cache will become invalid if
If a region is invalid in region cache, TiDB will get the latest region information from PD, and after it gets information from PD, it will do some pre-process to convert the raw region information to the region information used in TiDB, one of the pre-process is
filterUnavailablePeers
, that is for each replica of the region, if the related store is inDown
state, the replica will be fiter out, so TiDB does not see these replicas which the store are down.Problem
Consider a TiDB with 2 TiFlash nodes, and a TiDB table with 2 TiFlash replicas, When query from that table using TiFlash, TiDB will balance the accessed region among the 2 TiFlash nodes using
balanceBatchCopTask
, the basic idea inbalanceBatchCopTask
is first find out all the avaliable region replicas, then try to distribute the region replica to all avliable TiFlash nodes evenly. So if the cluster has multiple TiFlash nodes, it is expected that the query load is always balanced between all TiFlash nodes.Now assuming that a TiFlash node is down for a while, the cached region information will become invalid since TiDB found some replica become unreachable, it will try to reload the region, as described in the
Background
session, when reload region from PD, TiDB will filter out the replicas in thedown
store, so after reload the region, TiDB will see a region only containing 1 TiFlash replica. It works fine so far because the table has 2 TiFlash replicas, TiDB will only access the replica in the alive TiFlash node.However, after the down TiFlash comes back, TiDB still can't see the replica in that TiFlash unless it triggers another region reload. As metioned in
Background
session, TiDB will reload the region if it found problem when accessing the region or the cached region is not used for 10 minutes. That is to say, if the region can always be accessable(For AP query, it is possible that a table does not have too many write, so the region keeps unchanged for a long time), and TiDB keeps querying the table, TiDB will have no chance to reload the region, which makes TiFlash load very unbalanced(All the query tasks are still sending to one TiFlash). This is not the expected behavior because now the cluster has two TiFlash nodes, the query load should be balanced between the TiFlash nodes.The text was updated successfully, but these errors were encountered: