Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle breakpoints via postmortem when GDB is not running. #7704

Closed
wants to merge 23 commits into from

Conversation

mhightower83
Copy link
Contributor

@mhightower83 mhightower83 commented Nov 12, 2020

Objective: Add support for Debug, Kernel, and Double Exception reporting with minimum impact on IRAM usage. Additionally, improve reporting of Unhandled Exception causes for the disabled interrupt case.

Overview: In the absence of GDB, install a small breakpoint handler that clears EXCM and executes an illegal instruction. This leverages the existing exception handling code in the SDK for the ill instruction. At postmortem processing, identify the event and report. This allows for the alerting of embedded breakpoints. In the stack trace, EPC1 is updated with the value of EPC2. This helps "Exception Decoder" identify the line where the BP occurred. By using the SDK's handling for ill, we get support for handling the process of getting out of the "Cache_Read_Disable state" free, without consuming IRAM.

For breakpoints (BP), the SDK's default behavior is to loop, waiting for an interrupt, or until the HWDT kicks in. Note the current "C++" compiler will embed breakpoints when it detects a divide by zero at compile time.

Expand technique to preserve more of the current context, and report Kernel and Double exceptions through postmortem. As well as enhanced reporting for XTOS Unhandled Exceptions. This allows reporting of EXCCAUSE value 20, InstFetchProhibitedCause, this is commonly caused by calling address 0x00000000. For example, this can happen from a defined weak function definition without supplying the function, or a failure to check for NULL before calling a callback function. Without this enhancement for XTOS Unhandled Exceptions an HWDT reboot will occur.

To assist the "ESP Exception Decoder", a stack frame is created to store various special registers for inspection. The expectation is that some of these values may point to code. This can give insight into what was running at the time of the crash.

This enhancement is only built when DEBUG_ESP_PORT (Debug selection from the Arduino IDE) or DEBUG_ESP_EXCEPTIONS is defined. This enhancement by itself (built with DEBUG_ESP_EXCEPTIONS defined) adds about 620 bytes of IROM, 20 bytes of IRAM, and 0 bytes of DRAM to a sketch.

Edited: expanded description

In the absence of GDB, install a small breakpoint handler that clears EXCM and
executes an illegal instruction. This leverages the existing exception
handling code in the SDK for `ill` instruction. At postmortem processing,
the real event is identified and reported. This allows for alerting of embedded
breakpoints. In the stack trace, `epc1` is updated with the value of `epc2` so
that the "ESP Exception Decoder" will identify the line where the BP occurred.

The SDKs default behavior was to loop on breakpoints waiting for interrupts
>2 until the HWDT kicks in. The current "C++" compiler will embed breakpoints
when it detects a divide by zero at compile time.

Expanded technique to preserve more of the current state and to also report
Kernel and Double exceptions through postmortem.
Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, as always!

Have you seen that G++ behavior (inserting BP instead of letting the FP div-0 fault) on other platforms? It may be something we can fix in the machine definition, but I haven't the stomach to look..

@mhightower83
Copy link
Contributor Author

@earlephilhower Thanks!

Have you seen that G++ behavior ... on other platforms?

This is the only platform that I have significantly used G++. For me, the issue surfaced with the toolchain 10.x update.

I would expect this to be a build controllable option. I looked briefly and didn't find anything that worked in changing the outcome. At the time, it felt like the time investment for me vs. the gain was too great. GCC is a relatively new compiler for me and we do not work well together.

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of the exception handler mechanism details or postmortem, so I can't offer a review based on functionality. But @earlephilhower already approved, and he's aware of that :)
Code-wise, the one thing that bothers me is all the literal magic numbers inserted throughout the code. I suggest moving all literals to constexpr or defines, and having each one with its own comment explaining what it means. ROM function addresses, if any, should be moved to a function-address map file (there's already one with the known ROM functions, you could just add to it).

*/
if (rst_info.reason == REASON_EXCEPTION_RST && rst_info.exccause == 0) {
#if defined(DEBUG_ESP_PORT) || defined(DEBUG_ESP_EXCEPTIONS)
if ((rst_info.epc1 - 11u) == (uint32_t)_DoubleExceptionVector) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 11? what does that magic number mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the offset into the _DoubleExceptionVector where the ill instruction was executed. I found a better way to deal with that. I'll push later,

}
// BP - Debug Exception
else if (rst_info.epc2) {
if (rst_info.epc2 == 0x4000dc4bu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this magic address? What is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the address in the Boot ROM for the breakpoint instruction that executes for an unhandled exception. I'll replace with a constexpr.

else if (rst_info.epc2) {
if (rst_info.epc2 == 0x4000dc4bu) {
// Unhandled Exceptions land here. 'break 1, 1' in _xtos_unhandled_exception
if (20u == exc_context->exccause) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 20? what does the magic number mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXCCAUSE value 20, InstFetchProhibitedCause - An instruction fetch referenced a page mapped with an attribute that does not permit instruction fetch.

I'll add a constexpr for this in the next push.

#endif
// The GCC divide routine in ROM jumps to the address below and executes ILL (00 00 00) on div-by-zero
// In that case, print the exception as (6) which is IntegerDivZero
if (rst_info.epc1 == 0x4000dce5u) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the comment above, this magic number seema to be a ROM address. What is there? a 0x0? Can the value be moved into a constexpr and commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that address has the ill (illegal) instruction. Added comments and constexpr.

*/
typedef void (* _xtos_handler)(void);
static _xtos_handler * const _xtos_exc_handler_table = (_xtos_handler *)0x3FFFC000u;
#define ROM_xtos_unhandled_exception (reinterpret_cast<_xtos_handler>(0x4000dc44))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this magic address? Is it a ROM function? Does it make sense to move to the file where the ROM functions are mapped to addresses?
If not, does it make sense to change this define into a constexpr with a type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While _xtos_unhandled_exception is in the linker .ld file, it may have been overridden by the sketch or future SDK(not likely). We require the original Boot ROM function address to limit our override to those old values in the table.

With the new toolchain change constexpr doesn't work for this. It does not like an integer being cast to a function:

constexpr _xtos_handler ROM_xtos_unhandled_exception = (reinterpret_cast<_xtos_handler>(0x4000dc44));

error: 'reinterpret_cast' from integer to pointer

It works if I replace constexpr with const but I was concerned it would take up DRAM. I just tried it and in this case, it doesn't appear to change DRAM usage.

I'll change it to const for now.

reset. If interrupts are at INTLEVEL 2 or higher (a BP instruction becomes a
NOP), you still see a HWDT reset regardless of running GDB.
*/
typedef void (* _xtos_handler)(void);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using= over typedef for type definitions. In the case of a function ptr:

using _xtos_handler = void (*)(void);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had this covered, but I don't think so anymore. I think my problem stems from trying to reference xtensa sources that I can find on the Internet to fill in some of these missing function types. In the case of _xtos_handler, I am having trouble. They applied it to two very different function tables and it matches neither exactly.

  1. The first table is a set of ASM functions that do not comply with "C" conventions.
    • For .cpp files the typedef boils down to typedef void (*_xtos_handler)(...);
    • extern _xtos_handler _xtos_exc_handler_table[];
    • Of the set of ASM functions that are used in this table, there is a "C" wrapper function (_xtos_c_wrapper_handler) that references the 2nd table.
  2. The 2nd table of "C" styled exception handling functions, defines its entries with the same typedef.
    • extern _xtos_handler _xtos_c_handler_table[];
    • for the functions in this table I think this would better match the function call:
      • typedef void (_xtos_c_exception_handler_func)(struct __exception_frame *ef, int cause);
      • typedef _xtos_c_exception_handler_func *_xtos_c_handler;
      • extern _xtos_c_handler _xtos_c_handler_table[];

My quandary is:

  1. should I keep the typedef's I found and cast everything to fit? or
  2. Ignore the existing ones and hope those files never need to be added to this code base and define new ones?
    • for the first table I don't see a perfect fit for a non-"C" ASM. I think the current typedef would be fine.

Also affected: #7060 (comment)

NOP), you still see a HWDT reset regardless of running GDB.
*/
typedef void (* _xtos_handler)(void);
static _xtos_handler * const _xtos_exc_handler_table = (_xtos_handler *)0x3FFFC000u;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this magic address? What is there? Can it be moved out to a define or constexpr with comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatch table for EXCCAUSE values. Added constexpr and comments.

// Only replace Exception Table entries still using the orignal Boot ROM
// _xtos_unhandled_exception handler.
for (size_t i = 0; i < 64; i++) {
replace_exception_handler_on_match(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this 64 is the number of entries in the exception handler? Can the value be moved out to a constexpr or define?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

// Install all three (interdependent) exception handler stubs.
// Length of copy must be rounded up to copy memory in aligned 4 byte
// increments to comply with IRAM memory access requirements.
uint32_t save_ps = xt_rsil(15);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do 16 and 32 mean? doea it make sense to move to a define. constexpr and comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially wasn't sure how to generate values at compile time from the ASM blocks sizes. I think I have that resolved.

Additional - moved postmortem_init to run after preinit. This allows a user replacement
exception handler to be installed before postmortem_init which will replace the old
exception handlers.
_xtos_exc_handler_table is referenced.
At postmortem disable breakpoints.

For ASMs let the compiler pick tmp register.

Changed _xtos_handler to conform with xtensa/xtruntime.h definition.
I am trying to not conflick with their usage in case the file someday
finds its way into this code base; however, unless I am confused, I
think I see issues with its reuse of _xtos_handler, where it does not
strickly fit. At this time, these concerns are outside the scope
of this PR.
Added some externs and comments for reference.
For esp8266_undocumented.cpp, in the __cplusplus path replace `typedef` with `using`.
@mhightower83
Copy link
Contributor Author

@devyte I think this last push, resolves your remaining issues? Let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants