-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/pthread: Fix create/join race in thread lifecycle #57721
Conversation
Note that this may not be completely conformant to POSIX semantics. There's a pthread_detach() implementation using that same condition variable, and I suspect these two need to be unified. Hopefully someone with more POSIX-fu than I can explain the requirements. But the race needs to be addressed by using k_thread_join(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice shot.
I'm still working through this. Might need to be a bit patient.. |
I had a chance to look up pthread_detach(), btw. And indeed, the treatment here is insufficient. The documented behavior of pthread_detach() is that it wakes up any threads blocked in a pthread_join() call[1]. And now that that's a k_thread_join() we need some way to do that out of the scheduler (which is probably just as simple as a z_thread_wake_joiners() or something -- just a little lock around an unpend_all() call). Let me stick a DNM on this for now, pending a little bit of thought. [1] For the life of me, I can't understand why! The "meaning" of the call is that the OS can clean the thread up instead of holding on to things like its result code and process ID until something retrieves them (analogous to the way unix processes live as zombies until something calls wait()). But... that doesn't mean that you can't join it! This is just inventing a special edge case when none was needed. POSIX, I swear... |
POSIX has a somewhat odd behavior that when you call pthread_detach()[1], it is supposed to synchronously wake up anyone currently joining the thread. That isn't a Zephyr kernel behavior[2], so it needs to be exposed. The implementation is as simple as it sounds. This is minimal: we don't try to keep any state around to inform the k_thread_join() call whether the thread actually exited or if this new function was used. That has to be provided by the outer layers (pthreads already do this). [1] Which tells the OS/platform that it can clean the thread up as soon as it exits and doesn't have to keep it around as a zombie until someone calls pthread_join() on it. [2] Because... why!? Not needing to keep a thread around for joiners isn't the same thing as disallowing joins! Signed-off-by: Andy Ross <andyross@google.com>
The pthread_join() implementation was done using a condition variable and not k_thread_join(), and that's actually fatal. It's not possible in Zephyr to guarantee a thread has exited in a race-free way without calling either k_thread_abort() or k_thread_join(). This opened a hole where an app could call pthread_join(), which would go to sleep waiting on the condition variable, which would be signaled from the exit path of the thread, and then return to call pthread_create() (and thus k_thread_create()) on the same pthread_t object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD HAD ACTUALLY EXITED. This makes the scheduler blow up, at least on SMP (I wasn't personally able to make it happen on uniprocessor configs, though the race is present always). Rework to call k_thread_join(). There's no other correct way to do this. Fixes zephyrproject-rtos#56163 Signed-off-by: Andy Ross <andyross@google.com>
ef0e424
to
48596a1
Compare
New version, updated as described. This one seems to be working for me both through a full twister and on the test rig. Please re-review. |
I unfortunately force-pushed on my branch, but I should be able to test it out without the additional commit. Will give it a shot shortly. |
Close this. Patches got folded into #57637 and discussion continues there |
The pthread_join() implementation was done using a condition variable and not k_thread_join(), and that's actually fatal. It's not possible in Zephyr to guarantee a thread has exited in a race-free way without calling either k_thread_abort() or k_thread_join().
This opened a hole where an app could call pthread_join(), which would go to sleep waiting on the condition variable, which would be signaled from the exit path of the thread, and then return to call pthread_create() (and thus k_thread_create()) on the same pthread_t object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD HAD ACTUALLY EXITED.
This makes the scheduler blow up, at least on SMP (I wasn't personally able to make it happen on uniprocessor configs, though the race is present always).
Rework to call k_thread_join(). There's no other correct way to do this.
Fixes #56163