[FEATURE][ML] Fix possible deadlock in thread pool shutdown #343

tveasey · 2018-12-14T09:30:36Z

There was an error in the shutdown of the thread pool. In particular, although threads process tasks in their queue by preference, if there is a context switch whilst the main thread is adding shutdown messages to another thread which isn't waiting to pop its queue and hasn't yet had its shutdown message added then it could steal another queue's shutdown task. The thread whose shutdown task is stolen will then deadlock waiting for its last message iff its queue was already empty.

This PR changes shutdown to wait until all tasks have been processed before enqueuing shutdown tasks. This ensures that every task is blocked waiting to pop its own queue and will process only its own shutdown message and exit. This also means we get better concurrency draining the queues while shutting down and I tightened up one of the test thresholds as a result. See below.

This issue is difficult to test without inserting waits into both the add shutdown task loop and steal task loop, but I reproduced manually and it has been occurring intermittently in integration tests.

droberts195 · 2018-12-14T17:37:41Z

lib/core/CStaticThreadPool.cc

+    // execute its own shutdown message.
+    auto notIsEmpty = [](TTaskQueue& queue) { return queue.size() > 0; };
+    while (std::find_if(m_TaskQueues.begin(), m_TaskQueues.end(), notIsEmpty) !=
+           m_TaskQueues.end()) {


There's currently nothing to stop new tasks being scheduled after this test succeeds.

This is only called in the destructor.

droberts195 · 2018-12-14T18:13:18Z

lib/core/CStaticThreadPool.cc

+    auto notIsEmpty = [](TTaskQueue& queue) { return queue.size() > 0; };
+    while (std::find_if(m_TaskQueues.begin(), m_TaskQueues.end(), notIsEmpty) !=
+           m_TaskQueues.end()) {
+        std::this_thread::sleep_for(std::chrono::microseconds{50});


This seems like a code smell to me. We could commit it as a temporary fix to stop CI breaking, but I imagine there is a pattern for shutting down threadpools that doesn't involve sleeping.

I thought about this a bit...

There is if we modify the queue. I could have special case handling in the queue which allows me to pass a message to notify all consumer conditions (blocked on pop on empty queue) to wake up and exit.

I kind of like the idea of waiting until we've drained the queues. It means we don't complicate the queue and we don't have any extra logic going on all the time whenever we pop or push onto the queue. I don't think we'll need to start and stop thread pools for our use case (other than on main entry and exit). I could probably do this cleaner with a condition variable, but this was a quick fix for the tests on the branch. I'll leave a TODO to tidy this up.

tveasey · 2018-12-16T10:44:07Z

@droberts195 further to your comments, I reworked this by adding the capability to bind a task to a specific queue. This means each queue will execute exactly one shutdown message independent of scheduling, so will definitely shutdown. I kept in the "drain tasks" before starting to shutdown for the reason mentioned above. I also made a small optimisation which I noticed whilst testing. Can you take another look?

droberts195

LGTM

Fix possible deadlock in threadpool shutdown

8a1a03d

tveasey added >bug :ml labels Dec 14, 2018

tveasey requested a review from droberts195 December 14, 2018 09:30

tveasey added the review label Dec 14, 2018

tveasey mentioned this pull request Dec 14, 2018

[FEATURE][ML] change artifactname in order to create a CI build job #344

Merged

droberts195 reviewed Dec 14, 2018

View reviewed changes

Improve thread pool shutdown. Performance optimisation

a268dd5

droberts195 approved these changes Dec 16, 2018

View reviewed changes

comment

b54f570

tveasey merged commit f60c19d into elastic:feature/analysis-pipeline Dec 16, 2018

tveasey deleted the bug/thread-pool-shutdown branch May 1, 2019 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE][ML] Fix possible deadlock in thread pool shutdown #343

[FEATURE][ML] Fix possible deadlock in thread pool shutdown #343

tveasey commented Dec 14, 2018 •

edited

Loading

droberts195 Dec 14, 2018

tveasey Dec 14, 2018

droberts195 Dec 14, 2018

tveasey Dec 14, 2018 •

edited

Loading

tveasey commented Dec 16, 2018

droberts195 left a comment

[FEATURE][ML] Fix possible deadlock in thread pool shutdown #343

[FEATURE][ML] Fix possible deadlock in thread pool shutdown #343

Conversation

tveasey commented Dec 14, 2018 • edited Loading

droberts195 Dec 14, 2018

Choose a reason for hiding this comment

tveasey Dec 14, 2018

Choose a reason for hiding this comment

droberts195 Dec 14, 2018

Choose a reason for hiding this comment

tveasey Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

tveasey commented Dec 16, 2018

droberts195 left a comment

Choose a reason for hiding this comment

tveasey commented Dec 14, 2018 •

edited

Loading

tveasey Dec 14, 2018 •

edited

Loading