Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race in check_pausing test. #42

Closed
wants to merge 2 commits into from

Conversation

cyfdecyf
Copy link

@cyfdecyf cyfdecyf commented May 21, 2022

Describe the changes

Two changes:

  1. Use exit code 1 if test has failures.
    • So I can use a script to repeatedly run the test many times and stop when there's any failure.
  2. Fix a race in check_pause
    • The cause is describe in comment.
    • The race is found after I changed the queue to a simple concurrent queue which supports wait_and_pop. Full changes on my fork's master branch.

Thanks for this nice project. I'm new to concurrency support in C++, this project is clean and easy to understand, which motivates me to hack for my own requirements.

Testing

Have you tested the new code using the provided automated test program thread_pool_test.cpp and/or performed any other tests to ensure that it works correctly? If so, please provide information about the test system(s):

  • CPU model, architecture, # of cores and threads: Intel Core i5-8259U Processor
  • Operating system: macOS
  • Name and version of C++ compiler: Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  • Full command used for compiling, including all compiler flags: I'm using CMakeLists.txt.

@bshoshany
Copy link
Owner

Thanks! I will incorporate the exit code into v3.0.0, which will be released soon. Regarding check_pausing(), I'm not sure I understand what the issue is. The only place where pool.paused is modified is inside check_pausing() itself, so there's no scenario where pool.paused is set to true before pool.reset() is called. Or are you saying there's a general issue when invoking pool.reset() while the pool is paused?

@cyfdecyf
Copy link
Author

The root problem is that setting pause=true and adding more tasks can happen between check pause and pop task.

Suppose we have only 2 threads, worker thread (W) and test thread (T).

We may have the following execution sequence:

W: checks pause, it's false
T: set pause=true
T: add tasks
W: pop task

So the task added after setting pause=true got executed.

@bshoshany
Copy link
Owner

bshoshany commented May 21, 2022

Thanks for the clarification.

The effect of the statement if (!paused && pop_task(task)) in worker() is as follows: first, check that paused is false. If it is, pop a task.

You are correct that if paused changes to true after the check but before the task is popped, the task will still be popped, because the pool was paused too late. However, that is not a problem, it is simply part of how the pausing mechanism works.

But in any case, in the test program, check_pausing() sets paused to true while the pool is empty and before submitting any tasks, so I don't see how this can cause the test to fail. Please correct me if I'm wrong.

@bshoshany bshoshany closed this May 21, 2022
@cyfdecyf
Copy link
Author

In deed the race I mentioned previously is unlikely to happen in this test, since the test thread has print between setting pause to true and adding tasks. But it is possible to happen in theory.

For me, the problem is caused by pop waiting on condition variable, which is introduced in my code change.

The execution sequence is as follows:

W: thread starts execution
W: check paused got false, then pop (wait on condition variable)
T: set paused to true
T: add task
W: got task (wake up by condition variable)

For me, I'm planning to implement work thread local queue and work stealing, remove all sleep except in paused mode. It will be a big change which I can't expect full acceptance in this project.

But this small change can address the theoratical possible race, and maybe help other people who's doing something similar. That's why I send this PR.

@bshoshany
Copy link
Owner

Thanks, I do appreciate your contribution!

It sounds like what you're saying is that this change is only needed for your modified version of the library, so I'm not sure there's benefit in adding it to this library.

In addition, v3.0.0 of the library, which will be released soon, will replace sleep with condition variables (or possibly something else, I'm still testing different options) and therefore the pause mechanism will be rewritten too. There will be lots of other changes and improvements as well. I think you will find that v3.0.0 probably would work better for your needs.

Thanks again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants