-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with CONFIG_STACK_SENTINEL #32261
Comments
@StefJar isn’t this just a stack overflow? Have you tried increasing stack sizes? In particular the battery_thread stack |
That is what the error is saying. So I increased the stack for the thread from 1024B to 2048B. But the exception still comes up |
hey ho my code works like this
Guess something gets messed up at the thread creation |
Looks like this is related to ISRs and thread switching. in this case
I altered the adc isr functions and the thread functions. The thread function now takes 3 arguments and the adc isr is coded like the Zephyr documentation says:
|
I removed all own driver related ISRs and replaced them with the current Zephyr APIs. I am getting now a "HARD FAULT". This happens when the main thread switches to an other thread. Altering the stack size is not solving that problem.
|
After pulling the last commit 1ce264e my system does start up directly with
I put a break point directly at the start of the main function. When starting the debugger and reseting the MCU this breakpoint is not hit. Means that the kernel somehow starts but when changing to the main thread it fails with an stack overflow exception. |
I set up some break points The hard fault comes at https://github.com/zephyrproject-rtos/zephyr/blob/master/kernel/init.c#L410
|
@dcpleung can you please take a look? Not sure why it was assigned to me. |
I don't have a STM32 board with me right now so I cannot debug on hardware. Here are the stack size kconfigs after build
Could you try doubling these? It's how I usually try to see if the problem is stack overflow. Also, do you remember which commit on master where your code worked? If not, a good starting point would be the v2.5 release. It helps to narrow down the range of changes. |
doubled the stack sizes but still getting the error Think I found the place where the error happens. Its happen when the system work queue is init. https://github.com/zephyrproject-rtos/zephyr/blob/master/kernel/system_work_q.c#L30 When the work queue thread is created it get started. At z_swap z_check_stack_sentinel is called and that fails. I attach a screenshot showing the call stack @dcpleung hope that helps |
I got the
Is this issue reproducible with any samples or tests apps? |
didn't expected any fails with the CI tests Maybe the combination of
causes the problems. Just in case I am quickly attaching my board defconfig file
|
Building with One thing I can think of is that, there are enabled interrupts on your board and they are firing before the switch to main thread is completed, thus corrupting the interrupt stack. However, without a way to reproduce it using the available apps in the tree, it is impossible to figure out what is wrong. |
Thx @dcpleung for testing. |
@StefJar any updates on this. |
not yet |
update: BUT I managed to get that stack overflow when using a constant string at a My intension is now shifting towards the logger. In my project I have an own logging back end. |
Looks like it's not logger related. I can now trigger a reliable stack overflow by just setting |
@dcpleung any updates on this? |
Tried again but still cannot reproduce the issue:
|
lowering priority given that it can't be reproduced |
I like to give a short update on that issue
This happens before my main function is called. Any idea how I can more informations on who is crashing the stack sentinel? |
If the toolchain emits symbol for a particular thread, you can find it through |
In theory, stack sentinel should not be used together with HW_STACK_PROTECTION on Cortex-M. It's not needed. I've not checked, though, if these can co-exist smoothly. |
Yeap, the line of code could help here. |
Btw, I could not reproduce this either, using the same twister call as @dcpleung on nRF52840. I noticed @StefJar that you're using FPU - that might bring additional stack requirements. I wonder if what you see is an expected stack overflow. |
@ioannisg thx for testing and explaining. My exception are pointing to the stack sentinel code. That is why I ended up open this issue. For me its good to know that HW_STACK_PROTECTION and CONFIG_STACK_SENTINEL are conflicting. Maybe it would be beneficial to disable CONFIG_STACK_SENTINEL if HW_STACK_PROTECTION is enabled. the FPU stuff is super confusing. On app level you can give the K_FP_REGS to the thread creation. This macro becomes alive only if CONFIG_FPU_SHARING is set. Detecting a stack overflow at the thread switch is now tricky. Because the sentinel/HW stack protection needs to know the thread config. Currently I am going with the std. setup on that. When disabling CONFIG_STACK_SENTINEL my code runs my long run test(12h with permanent sensor i/o and bt data streaming). With the comment from @ioannisg I got a good explanation . I will leave this issue open for comments for 2 days and close it afterwards. |
I'm able to pretty consistenly reproduce it with one of the Zephyr Bluetooths samples on an NRF52840 dk. See https://devzone.nordicsemi.com/f/nordic-q-a/97002/enabling-config_fpu-causes-stack-overflow Would have been nice is KConfig disallowed CONFIG_STACK_SENTINEL=y and CONFIG_FPU=y in the presense of hardware stack protection. |
After pulling the latest zephyr commit my firmware breaks when a task switch/isr is done - somehow reliably
checking the map file:
reveals that the z_check_stack_sentinel function fails.
Inside the battery thread an ADC(DMA driven) conversion is triggered.
A semaphore is set at the isr. The battery thread is waiting for that semaphore. Guess that mechanism breaks the sentinel.
When disabling CONFIG_STACK_SENTINEL my firmware works perfectly.
Anyone that has the same problem?
My arch STM32F412 + custom board + custom shield
The text was updated successfully, but these errors were encountered: