-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Deadlock on pool exit #93
Comments
The likely cause is that there is a gap in while (running)
{
std::function<void()> task;
/////// if running set to false and task_available_cv.notify_all() called here then deadlock
std::unique_lock<std::mutex> tasks_lock(tasks_mutex);
task_available_cv.wait(tasks_lock, [this] { return !tasks.empty() || !running; });
if (running) |
Hi @alugowski, thanks for opening this issue. Sorry it took me so long to reply. I'm currently on hiatus from developing this package, since I'm too busy teaching. However, I plan to release a new version in the summer. It would be very useful if I could reproduce this issue on my own system so I can see what exactly is going on. Can you tell me exactly which test in your fast_matrix_market repository results in a deadlock? Like I said, I'm currently on hiatus, but will try to take a look if I can find the time. Thanks! |
@bshoshany no worries, I have a working workaround so I'm not blocked. The issue is very intermittent and any test can trigger it. I saw it somewhat regularly when I ran the entire test suite, mostly because each run of the test suite would create and destroy several hundred thread pools. With the number of tests I have now, maybe you'd expect to see a freeze half the time? I mean load the project in CLion and "Run All Tests". Note that fast_matrix_market includes my workaround, so if you'd like to experiment there you'll have to restore your stock code in /include/fast_matrix_market/3rdparty/BS_thread_pool_light.hpp. Before you dig in there, do you have any tests that just create a pool, load it with trivial dummy work, and destroy? And just loop that tens of thousands of times? I imagine that would show the issue as well. My code isn't doing anything weird. |
Thanks for the information. I'll look into it, but like I said, it might take some time. I'll be in touch! |
Closed as resolved by PR #108 (will be included in v3.4.0). |
I'm seeing a deadlock upon thread pool destruction:
My workload creates a thread pool for one expensive operation. Tasks are not created all at once, instead more are created as old ones finish.
wait_for_tasks()
makes no difference.There appears to be one worker thread left waiting for the condition variable as the thread is being joined.
I haven't managed to reproduce it with a small program yet. This is one actual use that suffers the issue:
https://github.com/alugowski/fast_matrix_market/blob/main/include/fast_matrix_market/write_body_threads.hpp
That code is exercised by my unit tests. If I loop a unit test a few thousand times then it nearly always deadlocks.
Workaround:
The following hack works around the issue. Replace the worker thread's
wait()
with await_until
and a duration. This way it won't deadlock for more than about 50ms. The break condition must be rechecked afterwards.Sounds related to #76.
The text was updated successfully, but these errors were encountered: