Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: xtensa: Fix xtensa error handler #56587

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

aasthagr
Copy link
Collaborator

@aasthagr aasthagr commented Apr 5, 2023

Force an early return from exception when a stack check failure is detected at interrupt/exception level. This helps to ensure that ARCH_EXCEPT() does not return.

@aasthagr aasthagr requested review from ceolin and peter-mitsis April 5, 2023 20:44
peter-mitsis
peter-mitsis previously approved these changes Apr 5, 2023
@aasthagr aasthagr marked this pull request as ready for review April 5, 2023 23:06
@zephyrbot zephyrbot added the area: Xtensa Xtensa Architecture label Apr 5, 2023
@@ -255,6 +255,10 @@ void *xtensa_excint1_c(int *interrupted_stack)
reason = bsa[BSA_A2_OFF/4];
/* Skip ILL to RETW */
bsa[BSA_PC_OFF/4] += 3;
/* Prevents xtensa_arch_except from returning in ISR context. */
if ((arch_curr_cpu()->nested > 1) && (reason == K_ERR_STACK_CHK_FAIL)) {
arch_curr_cpu()->nested = 1U;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more explaining on why this is here, how this works, and what kind of issues we need to be aware of. Next time someone else looking at this would be confused.

@dcpleung
Copy link
Member

dcpleung commented Apr 7, 2023

I am still hesitant on this. We would have seen the issue on other Xtensa platforms already if the condition has been handled incorrectly. That we haven't seen the problem on other platforms with this common code suggests the issue might be somewhere else.

@edersondisouza
Copy link
Collaborator

I am still hesitant on this. We would have seen the issue on other Xtensa platforms already if the condition has been handled incorrectly. That we haven't seen the problem on other platforms with this common code suggests the issue might be somewhere else.

The thing is what does CODE_UNREACHABLE do on different platforms - it may be 'do nothing' vs 'break stuff'. It seems it does nothing on like, qemu_xtensa built with gcc, but may behave differently on other platforms/toolchains.

@aasthagr
Copy link
Collaborator Author

aasthagr commented Apr 7, 2023

I am still hesitant on this. We would have seen the issue on other Xtensa platforms already if the condition has been handled incorrectly. That we haven't seen the problem on other platforms with this common code suggests the issue might be somewhere else.

The thing is what does CODE_UNREACHABLE do on different platforms - it may be 'do nothing' vs 'break stuff'. It seems it does nothing on like, qemu_xtensa built with gcc, but may behave differently on other platforms/toolchains.

Agree. Different toolchains are behaving differently to what CODE_UNREACHABLE does. But irrespective the xtensa_arch_except should not return which is what this PR would fix. If that returns, having CODE_UNREACHABLE macro when built using xcc identifies the issue.

@aasthagr aasthagr requested a review from dcpleung April 7, 2023 20:05
Copy link
Member

@dcpleung dcpleung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not remove my change request.

@dcpleung
Copy link
Member

dcpleung commented Apr 7, 2023

I prefer manipulating nested as close to the end of that function as possible:

diff --git a/arch/xtensa/core/xtensa-asm2.c b/arch/xtensa/core/xtensa-asm2.c
index 3e31dcaa93..14c68eab6f 100644
--- a/arch/xtensa/core/xtensa-asm2.c
+++ b/arch/xtensa/core/xtensa-asm2.c
@@ -215,6 +215,7 @@ static inline DEF_INT_C_HANDLER(1)
 void *xtensa_excint1_c(int *interrupted_stack)
 {
        int cause, vaddr, *bsa = *(int **)interrupted_stack;
+       bool is_fatal_error = false;
 
        __asm__ volatile("rsr.exccause %0" : "=r"(cause));
 
@@ -240,6 +241,8 @@ void *xtensa_excint1_c(int *interrupted_stack)
                /* Default for exception */
                int reason = K_ERR_CPU_EXCEPTION;
 
+               is_fatal_error = true;
+
                /* We need to distinguish between an ill in xtensa_arch_except,
                 * e.g for k_panic, and any other ill. For exceptions caused by
                 * xtensa_arch_except calls, we also need to pass the reason_p
@@ -253,9 +256,6 @@ void *xtensa_excint1_c(int *interrupted_stack)
                        cause = 63;
                        __asm__ volatile("wsr.exccause %0" : : "r"(cause));
                        reason = bsa[BSA_A2_OFF/4];
-                       /* Skip ILL to RETW */
-                       bsa[BSA_PC_OFF/4] += 3;
-                       pc = (void *)bsa[BSA_PC_OFF/4];
                }
 
                LOG_ERR(" ** FATAL EXCEPTION");
@@ -280,6 +280,36 @@ void *xtensa_excint1_c(int *interrupted_stack)
                                     (void *)interrupted_stack);
        }
 
+
+       if (is_fatal_error) {
+               uint32_t ignore;
+
+               /* We are going to manipulate _current_cpu->nested manually.
+                * Since the error is fatal, for recoverable errors, code
+                * execution must not return back to the current thread as
+                * it is being terminated (via above z_xtensa_fatal_error()).
+                * So we need to prevent more interrupts coming in which
+                * will affect the nested value as we are going outside of
+                * normal interrupt handling procedure.
+                *
+                * Setting nested to 1 has two effects:
+                * 1. Force return_to() to choose a new thread.
+                *    Since the current thread is being terminated, it will
+                *    not be chosen again.
+                * 2. When context switches to the newly chosen thread,
+                *    nested must be zero for normal code execution,
+                *    as that is not in interrupt context at all.
+                *    After returning from this function, the rest of
+                *    interrupt handling code will decrement nested,
+                *    resulting it being zero before switching to another
+                *    thread.
+                */
+               __asm__ volatile("rsil %0, " STRINGIFY(XCHAL_NMILEVEL)
+                                : "=r" (ignore) : : );
+
+               _current_cpu->nested = 1;
+       }
+
        return return_to(interrupted_stack);
 }

ceolin
ceolin previously approved these changes Apr 8, 2023
@ceolin
Copy link
Member

ceolin commented Apr 8, 2023

I am still hesitant on this. We would have seen the issue on other Xtensa platforms already if the condition has been handled incorrectly. That we haven't seen the problem on other platforms with this common code suggests the issue might be somewhere else.

The thing is what does CODE_UNREACHABLE do on different platforms - it may be 'do nothing' vs 'break stuff'. It seems it does nothing on like, qemu_xtensa built with gcc, but may behave differently on other platforms/toolchains.

The problem was just noticed when `CODE_UNREACHABLE`` was added, much likely before that the exception happened and the handler returned to thread passing that instruction and the thread just finished further.

@aasthagr
Copy link
Collaborator Author

Please do not remove my change request.

Sorry just re-requested your review. Didnt remove the change request.

@aasthagr aasthagr force-pushed the fix_xtensa_error_handler branch from 0a4f12a to c114a16 Compare April 11, 2023 00:17
In case of recoverable fatal errors the execution should
switch to another thread. This will ensure the current_cpu nested
count is reset  when there is a context switch.

Signed-off-by: Aastha Grover <aastha.grover@intel.com>
@aasthagr aasthagr force-pushed the fix_xtensa_error_handler branch from c114a16 to e8c293a Compare April 11, 2023 00:43
@nashif nashif merged commit 4084726 into zephyrproject-rtos:main Apr 11, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 12, 2023

This commit breaks compilation with RG-2017.8-linux/XtensaTools/bin/xt-xcc used for TGL/ADL/RPL:

zephyr/arch/xtensa/core/xtensa-asm2.c: In function ‘xtensa_excint1_c’:
zephyr/arch/xtensa/core/xtensa-asm2.c:306: error: expected string literal before ‘)’ token

I wish we could upgrade away from this toolchain but it's unfortunately not going to happen:

cc: @kv2019i , @lgirdwood

@dcpleung
Copy link
Member

@marc-hb see if #56790 fixes that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Xtensa Xtensa Architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants