MPPTask destruct when holding the lock of MPPTaskManager, and cause TiFlash hang forever #4954
Labels
affects-6.0
affects-6.1
component/compute
severity/critical
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Consider A MPP query has 2 MPPTask in one TiFlash node, task1 and task2, and task1 reads data from task2 using local tunnel.
prepare
andpreprocess
, both of them are waiting for scheduler, so they are pushed into thewaiting_tasks
queueEXCEEDED
EXCEEDED
, then it throws exception, and unregister itself fromMPPTaskManager
, note that unregister only moves the task1 fromtask_map
, andwaiting_tasks
queue still holds the reference of task1, and afterrunImpl
finishes, the reference inwaiting_tasks
is the last reference of task1 in the system.CandMPPTask
request to TiFlash to cancel the mpp queryCancelMPPQuery
, it acquires the lock ofMPPTaskManager
, then callsscheduler->deleteQuery
, insidedeleteQuery
, it removes task1 fromwaiting_tasks
, since the task1 inwaiting_tasks
is the last reference of the shared ptr, task1 is destructed after it is removed fromwaiting_tasks
ExchangeReceiver
will wait the reading thread exit, for task1, the reading thread inExchangeReceiver
is the local read thread which tries to read data from task2, local read thread can only be exited after task 2 finishes or task1/task2 is cancelled, butSo deadlock happens, and since the
CancelMPPQuery
holds the lock ofMPPTaskManager
, no more queries can be served.2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiFlash version? (Required)
The text was updated successfully, but these errors were encountered: