Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posix: pthread: race condition between pthread_create() and pthread_join() #56163

Closed
Tracked by #51211
hakehuang opened this issue Mar 23, 2023 · 16 comments · Fixed by #57403 or #58924
Closed
Tracked by #51211

posix: pthread: race condition between pthread_create() and pthread_join() #56163

hakehuang opened this issue Mar 23, 2023 · 16 comments · Fixed by #57403 or #58924
Assignees
Labels
area: POSIX POSIX API Library bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@hakehuang
Copy link
Collaborator

hakehuang commented Mar 23, 2023

Previous Title
posix_common derived test fails on some qemu platforms when the CONFIG_TICKLESS_KERNEL=n is added

Describe the bug
posix_common derived test fails on some qemu platforms when the CONFIG_TICKLESS_KERNEL=n is added

Please also mention any information which could help others to understand
the problem you're facing:

  1. add a delay in the test_pthread_descriptor_leak loop can help
  2. use tickless mode can work as well
  • Is this a regression? If yes, have you been able to "git bisect" it to a
    specific commit?
    No.

To Reproduce
Steps to reproduce the behavior:

  1. add CONFIG_TICKLESS_KERNEL=n to prj.conf
  2. do below
west build -b qemu_cortex_r5 -- -DCONFIG_NEWLIB_LIBC=y
west build -t run

Expected behavior
PASS

Impact
some potential race condition in pthread handling

Logs and console output

*** Booting Zephyr OS build zephyr-v3.3.0-1516-g1dce3c3ee2e9 ***
...
===================================================================
START - test_pthread_descriptor_leak

    Assertion failed at WEST_TOPDIR/zephyr/tests/posix/common/src/pthread.c:598: posix_apis_test_pthread_descriptor_leak: (pthread_join(pthread1, &unused) is non-zero)
unable to join thread 31
 FAIL - test_pthread_descriptor_leak in 0.032 seconds
===================================================================
START - test_sleep
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/kernel/sched.c:1797
        aborted _current back from dead
E: r0/a1:  0x00000004  r1/a2:  0x00000705  r2/a3:  0xff00002c
E: r3/a4:  0x000089b9 r12/ip:  0x000242e8 r14/lr:  0x0000b2a1
E:  xpsr:  0x0000013f
E: Faulting instruction address (r15/pc): 0x0000c934
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x20038 (unknown)
E: Halting system
...

Environment (please complete the following information):

  • OS: (e.g. Linux, )
  • Toolchain (e.g Zephyr SDK, ...)
  • Commit SHA or Version used: zephyr-v3.3.0-1516-g1dce3c3ee2e9
@hakehuang hakehuang added the bug The issue is a bug, or the PR is fixing a bug label Mar 23, 2023
@henrikbrixandersen henrikbrixandersen added the area: POSIX POSIX API Library label Mar 28, 2023
@jgl-meta jgl-meta added the priority: low Low impact/importance bug label Mar 28, 2023
@cfriedt
Copy link
Member

cfriedt commented Apr 7, 2023

On commit 336803e, I'm unable to reproduce this on main with the command below (also tried CONFIG_TICKLESS_KERNEL=n in prj.conf).

west build -p always -b qemu_cortex_a53 -t run tests/posix/common -- \
  -DCONFIG_TICKLESS_KERNEL=n -DCONFIG_NEWLIB_LIBC=y

Oddly, what I have found with qemu_cortex_r5 is that if I disable other tests within the same testsuite, the test passes. E.g.

for i in tests/posix/common/src/*.c; do
  j=$(basename $i)
  if [ "$j" = "main.c" -o $j = "pthread.c" ]; then
    continue
  fi
  k=$i.ignore
  mv $i $k
done
west build -p always -b qemu_cortex_r5 -t run tests/posix/common -- \
  -DCONFIG_TICKLESS_KERNEL=n -DCONFIG_NEWLIB_LIBC=y

I am able to reproduce this on qemu_cortex_r5 without disabling other parts of the testsuite though.

That leads me to believe we are looking at possible stack corruption (stack overflow, ..). Another alternative is that there may be some uninitialized variable.

@cfriedt
Copy link
Member

cfriedt commented Apr 7, 2023

I tried this [1], which sets stack sizes to a minimum size, but the problem persisted. That makes me think it's more of an uninitialized variable issue or maybe an alignment issue with the architecture.

If it's an uninitialized variable, it could take some time to pinpoint, unfortunately.

That's very odd that this issue only presents itself with qemu_cortex_r5 as well. @hakehuang - this happens with NXP hardware too, right?

@microbuilder, do you know if Cortex-R5 has any special requirements w.r.t. thread stack alignment?

[1]

git checkout .
west build -p always -b qemu_cortex_r5 tests/posix/common -- \
    -DCONFIG_TICKLESS_KERNEL=n -DCONFIG_NEWLIB_LIBC=y
  grep STACK build/zephyr/.config | \
  grep -v "^#" | \
  grep -v "=\(y\|=n\)$" | \
  while read -r LINE; do \
    VAL="$(echo "$LINE" | awk '{split($0,a,"="); print a[2];}')" \
    if [ $((2*VAL)) -lt 4096 ]; then \
      NEWVAL=8192 \
    else \
      NEWVAL=$((2*VAL)) \
    fi \
    CONFIG="$(echo "$LINE" | awk '{split($0,a,"="); print a[1];}')" \
    echo "$CONFIG=$NEWVAL" \
  done | \
  tee -a tests/posix/common/prj.conf
west build -p always -b qemu_cortex_r5 -t run tests/posix/common -- \
    -DCONFIG_TICKLESS_KERNEL=n -DCONFIG_NEWLIB_LIBC=y

@hakehuang
Copy link
Collaborator Author

That's very odd that this issue only presents itself with qemu_cortex_r5 as well. @hakehuang - this happens with NXP hardware too, right?

no, only in QEMU or renode platform

@hakehuang
Copy link
Collaborator Author

Oddly, what I have found with qemu_cortex_r5 is that if I disable other tests within the same testsuite, the test passes. E.g.

if the pthread test is removed, all other will not fail, and per my testing it is only the last test case in the pthread which will cause problem for itself or for following test, a simple way to check is add a sleep right after the test. which will trigger the problem.

@cfriedt
Copy link
Member

cfriedt commented Apr 29, 2023

@hakehuang @dleach02 - please take a look at #57403 - it seems the solution was a lot easier than we anticipated.

@cfriedt cfriedt added backport v2.7-branch Request backport to the v2.7-branch and removed backport v2.7-branch Request backport to the v2.7-branch labels Apr 29, 2023
@cfriedt cfriedt reopened this May 2, 2023
@cfriedt
Copy link
Member

cfriedt commented May 2, 2023

Re-opening because it still is not fixed.
I was able to exacerbate the problem by greatly increasing the number of pthread_create() / pthread_join() requests up to 100 times the maximum number of pthreads.

It should not be necessary to re-initialize the same attribute over and over again as well.

Just to be clear, we are not actually leaking descriptors, but we are seeing the effects of some kind of race condition between pthread_create(), and pthread_join().

ZTEST(posix_apis, test_pthread_descriptor_leak)
{
	pthread_t pthread1;
	pthread_attr_t attr;

	zassert_ok(pthread_attr_init(&attr));
	zassert_ok(pthread_attr_setstack(&attr, &stack_e[0][0], STACKS));

	/* If we are leaking descriptors, then this loop will never complete */
	for (size_t i = 0; i < CONFIG_MAX_PTHREAD_COUNT * 100; ++i) {
		zassert_ok(pthread_create(&pthread1, &attr, create_thread1, NULL),
			   "unable to create thread %zu", i);
		zassert_ok(pthread_join(pthread1, NULL), "unable to join thread %zu", i);
	}
}

The same thing definitely does not occur with k_thread_create() and k_thread_join(), so there is definitely a bug here, and it exists in main as well as v2.7-branch.

It will likely help to create a reliable stress test where we're not executing the entire tests/posix/common suite. Something like the above code snippet, even repeated 1000 times.

@cfriedt cfriedt added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels May 2, 2023
@cfriedt
Copy link
Member

cfriedt commented May 2, 2023

Setting to priority medium as it affects multiple platforms.

@cfriedt cfriedt added priority: high High impact/importance bug and removed priority: medium Medium impact/importance bug labels May 2, 2023
@cfriedt
Copy link
Member

cfriedt commented May 2, 2023

On second thought - affects multiple platforms and is breaking CI - setting to high priority.

@cfriedt
Copy link
Member

cfriedt commented May 7, 2023

#57637 can be used to more reliably reproduce the race condition. Still working on a fix.

andyross added a commit to andyross/zephyr that referenced this issue May 9, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
@andyross
Copy link
Contributor

andyross commented May 9, 2023

See commit message in linked PR. The root cause was a hand-rolled attempt at pthread_join() when it needed to be using the underlying k_thread_join() instead.

andyross added a commit to andyross/zephyr that referenced this issue May 10, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 12, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 12, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 12, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 13, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 13, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 14, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 14, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
cfriedt pushed a commit to cfriedt/zephyr that referenced this issue May 21, 2023
The pthread_join() implementation was done using a condition variable
and not k_thread_join(), and that's actually fatal.  It's not possible
in Zephyr to guarantee a thread has exited in a race-free way without
calling either k_thread_abort() or k_thread_join().

This opened a hole where an app could call pthread_join(), which would
go to sleep waiting on the condition variable, which would be signaled
from the exit path of the thread, and then return to call
pthread_create() (and thus k_thread_create()) on the same pthread_t
object (and thus the same struct k_thread) BEFORE THE EARLIER THREAD
HAD ACTUALLY EXITED.

This makes the scheduler blow up, at least on SMP (I wasn't personally
able to make it happen on uniprocessor configs, though the race is
present always).

Rework to call k_thread_join().  There's no other correct way to do
this.

Fixes zephyrproject-rtos#56163

Signed-off-by: Andy Ross <andyross@google.com>
@cfriedt
Copy link
Member

cfriedt commented May 21, 2023

As it turns out, there is also a race between k_thread_create() and k_thread_join()

@andyross
Copy link
Contributor

Can you file that as a bug? There really shouldn't be as I'm reading the code.

@cfriedt
Copy link
Member

cfriedt commented May 22, 2023

Can you file that as a bug? There really shouldn't be as I'm reading the code.

@andyross - created #58116 for you

@andyross
Copy link
Contributor

Comment left in #58116 for anyone interested. FWIW: it seems not to be a kernel race in thread lifecycle, more like a arm64-specific race in context handling (and maybe x86_64, but with 1000x lower frequency).

FWIW: I'd treat this as two separate issues and merge the pthread fixes. If you need to tune the test[1] on arm64 for CI, that seems acceptable as long as we have a bug tracking the instability.

[1] SMP tests are already retried in CI because of unavoidable timer skew problems with scheduling "CPUs" onto host threads that can be preempted for 10ms+ in busy CI runs. (Which isn't what we're seeing here -- no timer interaction in this test!). The frequency I'm seeing probably wouldn't even show up above the noise floor.

@cfriedt cfriedt added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels May 27, 2023
@cfriedt cfriedt changed the title posix_common derived test fails on some qemu platforms when the CONFIG_TICKLESS_KERNEL=n is added posix: pthread: race condition between pthread_create() and pthread_join() Jun 5, 2023
@cfriedt
Copy link
Member

cfriedt commented Jun 5, 2023

Just wanted to note here that although this issue has highlighted an underlying issue with k_thread_create() and k_thread_join() which has been mitigated (at least made so that the probabilities are quite negligible in parts per billion), the race still very much exists in pthreads.

Working on a fix for this which I hope will be accepted into v3.4.0

@cfriedt
Copy link
Member

cfriedt commented Jun 6, 2023

Probably should not have been closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment