[BUG] Deadlock on pool exit #93

alugowski · 2023-01-10T07:11:20Z

I'm seeing a deadlock upon thread pool destruction:

My workload creates a thread pool for one expensive operation. Tasks are not created all at once, instead more are created as old ones finish. wait_for_tasks() makes no difference.

There appears to be one worker thread left waiting for the condition variable as the thread is being joined.

I haven't managed to reproduce it with a small program yet. This is one actual use that suffers the issue:
https://github.com/alugowski/fast_matrix_market/blob/main/include/fast_matrix_market/write_body_threads.hpp

That code is exercised by my unit tests. If I loop a unit test a few thousand times then it nearly always deadlocks.

Workaround:
The following hack works around the issue. Replace the worker thread's wait() with a wait_until and a duration. This way it won't deadlock for more than about 50ms. The break condition must be rechecked afterwards.

task_available_cv.wait_until(tasks_lock, std::chrono::system_clock::now() + 50ms, [this] { return !tasks.empty() || !running; });
if (running)
{
    if (tasks.empty()) {
        continue;
    }

Sounds related to #76.

CPU model, architecture, # of cores and threads: M1 Pro, 8 cores
Operating system: macOS 13
Name and version of C++ compiler: Clang 15 from homebrew
Thread pool library version: 3.3.0 (light version)

The text was updated successfully, but these errors were encountered:

alugowski · 2023-01-10T10:53:33Z

The likely cause is that there is a gap in worker() between checking the exit condition and waiting on task_available_cv. If destroy_threads() happens to call task_available_cv.notify_all() at that time then the worker will have missed the notification and will deadlock.

while (running)
{
    std::function<void()> task;
/////// if running set to false and task_available_cv.notify_all() called here then deadlock
    std::unique_lock<std::mutex> tasks_lock(tasks_mutex);
    task_available_cv.wait(tasks_lock, [this] { return !tasks.empty() || !running; });
    if (running)

bshoshany · 2023-02-09T21:17:08Z

Hi @alugowski, thanks for opening this issue. Sorry it took me so long to reply. I'm currently on hiatus from developing this package, since I'm too busy teaching. However, I plan to release a new version in the summer.

It would be very useful if I could reproduce this issue on my own system so I can see what exactly is going on. Can you tell me exactly which test in your fast_matrix_market repository results in a deadlock?

Like I said, I'm currently on hiatus, but will try to take a look if I can find the time. Thanks!

alugowski · 2023-02-10T02:00:51Z

@bshoshany no worries, I have a working workaround so I'm not blocked.

The issue is very intermittent and any test can trigger it. I saw it somewhat regularly when I ran the entire test suite, mostly because each run of the test suite would create and destroy several hundred thread pools. With the number of tests I have now, maybe you'd expect to see a freeze half the time? I mean load the project in CLion and "Run All Tests". Note that fast_matrix_market includes my workaround, so if you'd like to experiment there you'll have to restore your stock code in /include/fast_matrix_market/3rdparty/BS_thread_pool_light.hpp.

Before you dig in there, do you have any tests that just create a pool, load it with trivial dummy work, and destroy? And just loop that tens of thousands of times? I imagine that would show the issue as well. My code isn't doing anything weird.

bshoshany · 2023-02-10T04:42:04Z

Thanks for the information. I'll look into it, but like I said, it might take some time. I'll be in touch!

bshoshany · 2023-05-12T02:01:48Z

Closed as resolved by PR #108 (will be included in v3.4.0).

alugowski added the bug Something isn't working label Jan 10, 2023

alugowski assigned bshoshany Jan 10, 2023

This was referenced May 10, 2023

Threadpool create/destroy deadlock [BUG] #107

Closed

deadlock[BUG] #100

Closed

Fix pool shutdown deadlock #108

Closed

bshoshany closed this as completed May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Deadlock on pool exit #93

[BUG] Deadlock on pool exit #93

alugowski commented Jan 10, 2023

alugowski commented Jan 10, 2023

bshoshany commented Feb 9, 2023

alugowski commented Feb 10, 2023

bshoshany commented Feb 10, 2023

bshoshany commented May 12, 2023

[BUG] Deadlock on pool exit #93

[BUG] Deadlock on pool exit #93

Comments

alugowski commented Jan 10, 2023

alugowski commented Jan 10, 2023

bshoshany commented Feb 9, 2023

alugowski commented Feb 10, 2023

bshoshany commented Feb 10, 2023

bshoshany commented May 12, 2023