Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Core 1 panic'ed (Unhandled debug exception) #6010

Closed
dsyleixa opened this issue Dec 11, 2021 · 37 comments · Fixed by #6025
Closed

Error: Core 1 panic'ed (Unhandled debug exception) #6010

dsyleixa opened this issue Dec 11, 2021 · 37 comments · Fixed by #6025
Assignees
Milestone

Comments

@dsyleixa
Copy link

dsyleixa commented Dec 11, 2021

Arduino IDE 1.8.9
ESP32 board 1.0.6 (edit; meanwhile updated to 2.0.1)
default settings

generally my program runs fine, but sometimes, unexpectedly, I get this error -
but why and what does that mean....?

Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask)
Core 1 register dump:
PC : 0x400d3254 PS : 0x00060636 A0 : 0x800d3824 A1 : 0x3ffb0030
A2 : 0x0000000e A3 : 0x3ffcc4cc A4 : 0xfffffe77 A5 : 0x00000080
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x3ffc1c24 A9 : 0x00000008
A10 : 0xfffffce1 A11 : 0x00000002 A12 : 0x00000002 A13 : 0x00000353
A14 : 0x0000000a A15 : 0x3ffb08d0 SAR : 0x00000011 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff8

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d493c:0x3ffb1cf0 0x400d683b:0x3ffb1e30 0x400d71a4:0x3ffb1ef0 0x400e352d:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting...

the program is this one:
https://github.com/dsyleixa/Arduino/tree/master/ESP32_GBox/ESP32_Box023
the error happens sometimes when running the "chess" subroutine.

just to mention:
the chess program (i.e., the move generator) is the same as for my Arduino Due and for my Raspberry Pi, and there it works absolutely fine without any problem ever. So IMO the issue here on my ESP32 is probably not related to the chess algorithm itself as far as I can see.

PS, to clraify:
sometimes the move generator crashes already at the 2nd or 5th recursive ply after ~50000 move computations or even less,
sometimes it runs fine through the 7th recursive ply by more than 1 or 2 millions move computations and returns a valid and smart move, e.g.:

2 ply, searched:         9 
 3 ply, searched:       164 
 4 ply, searched:      1018 .
 5 ply, searched:     10045 ..........
 6 ply, searched:    138116 .........................................................................................................
 7 ply, searched:   1099725 
n b8-c6

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        29 
 3 ply, searched:       466 
 4 ply, searched:      3262 ..
 5 ply, searched:     32422 ................
 6 ply, searched:    251225 
B c1-g5  

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        63 
 3 ply, searched:       741 
 4 ply, searched:      2559 ....
 5 ply, searched:     31058 ..Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask) 

@dsyleixa
Copy link
Author

update:
different game by different moves, but similar error after a while - still no clue what's happening here:

Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask)
Core 1 register dump:
PC : 0x400d3254 PS : 0x00060836 A0 : 0x800d3824 A1 : 0x3ffb0030
A2 : 0x00000007 A3 : 0x3ffcc4cc A4 : 0xfffffcec A5 : 0x00000080
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x3ffc1c8f A9 : 0x00000008
A10 : 0x00000008 A11 : 0x00000001 A12 : 0x00000005 A13 : 0x00000020
A14 : 0x00000020 A15 : 0x3ffb08d0 SAR : 0x00000011 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff8

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d4968:0x3ffb1cf0 0x400d6867:0x3ffb1e30 0x400d71d0:0x3ffb1ef0 0x400e3559:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting...
ets Jun 8 2016 00:22:57

@SuGlider
Copy link
Collaborator

@dsyleixa
Debug exception reason: Stack canary watchpoint triggered (loopTask)
This message points to stack related issues in loop() or in any other function that is called from it.

Some nice explanation about possible reasons for this error can be found at this link:
https://arduino.stackexchange.com/questions/80729/esp32-stack-canary-watchpoint-triggered

Good Luck!

@SuGlider
Copy link
Collaborator

This tool will help you in debugging this issue:
https://github.com/me-no-dev/EspExceptionDecoder

You can decode the backtrace message and find out where the exception was thrown.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 11, 2021

I did this, but I don't understand what to c+p into the field and how to proceed
if I paste the Backtrace

0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d4968:0x3ffb1cf0 0x400d6867:0x3ffb1e30 0x400d71d0:0x3ffb1ef0 0x400e3559:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

then nothing happens.
no button to press and no action follows

@dsyleixa
Copy link
Author

dsyleixa commented Dec 12, 2021

PS, just to mention,
I also had added lots of delay(1) in between my for() and while() loops not to block the scheduler.
I also added Serial.print('.') debug outputs to indicate incidental blocking,
but when the Panic Error happens then it's in a <500ms interval since the last Serial.print('.')
so it clearly does not block the scheduler watchdog.

@atanisoft
Copy link
Collaborator

@dsyleixa This has nothing to do with the task scheduler, this is entirely on using more than 8kb of stack space in setup()/loop(). If you have large arrays (such as: uint8_t something[5000]) it may crash upon entry to the method, this is the most common reason for stack canary crashes like you post.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 12, 2021

Hmmmh...
let me elaborate on it...:

Apart from Chess(), the entire program runs fine on the ESP32 through setup() and loop() and also may call other subprograms such as Paint() or Pong() without any issues.
The error never occurs with different sub-programs, just with the subprogram "chess".
All subprograms are using the same global variables which are used by setup() and loop().

So all runs fine >>>>>>>> untill Chess() is run.

But also the first couples of chess moves are always fine though, so also Chess() does not violate the RAM size limit when starting.
Also manual moves are apllied correctly, calling the auto move generator just for move legality checks, and then also the first couples of auto moves, too.

Furthermore for Chess(), the error happens not always and not reproducably,
sometimes after the 7th auto move generation ply, sometimes even after the 3rd, or perhaps after the 8th or 9th,
always by identical boardsettings and identical move series.
That actually makes me doubt that it's a RAM size issue.

OTOH, I meanwhile tested the Chess subprogram also on my Mega2560 too (because of smaller RAM than on Due or Raspi) , and also over there it always runs fine,
so IMO it probably cannot happen because of the Chess() variables on the stack (CMIIW).

when compiling, the IDE says:

The sketch uses 854266 bytes (65%) of the program memory. The maximum is 1310720 bytes.
Global variables use 48692 bytes (14%) of dynamic memory, leaving 278988 bytes for local variables. The maximum is 327680 bytes. 

I am completely at a loss, tbh...

@atanisoft
Copy link
Collaborator

IDE output of memory usage is not applicable to task stack sizing. It is only applicable to global variable allocations (one that you don't create via new/malloc/etc) and for overall size of the program with respect to the partition size.

Comparing ESP32 to an AVR Mega2650 is not a good argument for "it works", they are entirely different architectures AND the Mega2650 does not use task stacks but instead allocates on heap directly which is not applicable in an RTOS environment.

Since you have not shared much in the way of code nobody will be able to point out where your program is going awry other than general ideas like @SuGlider and I've posted.

@dsyleixa
Copy link
Author

I actually already shared the code above, in the TOP:
https://github.com/dsyleixa/Arduino/tree/master/ESP32_GBox/ESP32_Box023

@SuGlider
Copy link
Collaborator

SuGlider commented Dec 12, 2021

My general guess is about Stack Overwflow because of potential Chess recursion.

The main difference from ESP32 Arduino to other Chips Arduino is that in ESP32 everything is running under FreeRTOS, thus, as @atanisoft said, loop() and setup() are tasks with a limit of 8K Stack each. You can create a separated task to specific routines (such as Chess, with higher Stack size for that task).

For the other Chips, Arduino is built as a pure Bare Metal application and Stack can possibly reach higher limits in available RAM, depending on the way it was built and configured. So it could explain why you don't see any errors with other "Chip Arduinos".

From the link I posted there is a general explanation:
https://arduino.stackexchange.com/questions/80729/esp32-stack-canary-watchpoint-triggered

  • recursive functions - each time a function recurses it uses stack space. If it recurses deeply enough then it will trample the stack guard and cause this exception. For instance:
int count(i) {
  i--;

  if(i > 0) {
    Serial.println(count(i));
  }

  return i;
}

void loop() {
  count(8000);
}

Each time a function recurses, its return address and its arguments and local variables are all stored on the stack. If it recurses too many times it will use more storage than is allocated to the stack.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 12, 2021

well, as already stated, if it was a RAM issue then it's supposed to happen always reproducably at the same time, but it does not!
See here:
#6010 (comment)

sometimes the move generator crashes already at the 2nd or 5th recursive ply after ~50000 move computations or even less,
sometimes it runs fine through the 7th recursive ply by more than 1 or 2 millions move computations and returns a valid and smart move, e.g.:

2 ply, searched:         9 
 3 ply, searched:       164 
 4 ply, searched:      1018 .
 5 ply, searched:     10045 ..........
 6 ply, searched:    138116 .........................................................................................................
 7 ply, searched:   1099725 
n b8-c6

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        29 
 3 ply, searched:       466 
 4 ply, searched:      3262 ..
 5 ply, searched:     32422 ................
 6 ply, searched:    251225 
B c1-g5  

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        63 
 3 ply, searched:       741 
 4 ply, searched:      2559 ....
 5 ply, searched:     31058 ..Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask) 

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

update:
I meanwhile even decreased the max deepening and the HashTable size and it sometimes already crashes in the 4th deepening ply whilst earlier it had successfully calculated up to the 7th ply.
So IMO it really can't be a recursive RAM capture thing actually.


  A B C D E F G H 
  --------------- 
8 r . . q k b . r 8 
7 + + + b n + + + 7 
6 . . . . + . . . 6 
5 . . . + . . . . 5 
4 . n . * . . . . 4 
3 . . N . * N . . 3 
2 * * * . B * * * 2 
1 R . B Q K . . R 1 
  --------------- 
  A B C D E F G H 
> WHITE: 
 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 19 

 0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:       193 
 3 ply, searched:       542 
 4 ply, searched:      5316 ..........Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask) 
Core 1 register dump:
PC      : 0x400d3310  PS      : 0x00060036  A0      : 0x800d38da  A1      : 0x3ffb0010  
A2      : 0xffffffe9  A3      : 0x3ffc1d58  A4      : 0x3ffbdc08  A5      : 0x00000080  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffc1bdc  A9      : 0x00000008  
A10     : 0x00000008  A11     : 0x00000001  A12     : 0x00000005  A13     : 0x00000020  
A14     : 0x00000020  A15     : 0x3ffb08b0  SAR     : 0x00000011  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffff8  

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3310:0x3ffb0010 0x400d38d7:0x3ffb0150 0x400d38d7:0x3ffb0290 0x400d38d7:0x3ffb03d0 0x400d38d7:0x3ffb0510 0x400d38d7:0x3ffb0650 0x400d38d7:0x3ffb0790 0x400d38d7:0x3ffb08d0 0x400d38d7:0x3ffb0a10 0x400d38d7:0x3ffb0b50 0x400d38d7:0x3ffb0c90 0x400d38d7:0x3ffb0dd0 0x400d38d7:0x3ffb0f10 0x400d38d7:0x3ffb1050 0x400d38d7:0x3ffb1190 0x400d38d7:0x3ffb12d0 0x400d38d7:0x3ffb1410 0x400d38d7:0x3ffb1550 0x400d38d7:0x3ffb1690 0x400d38d7:0x3ffb17d0 0x400d38d7:0x3ffb1910 0x400d38d7:0x3ffb1a50 0x400d38d7:0x3ffb1b90 0x400d4a65:0x3ffb1cd0 0x400d6993:0x3ffb1e30 0x400d72f4:0x3ffb1ef0 0x400e365d:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting...

Nonetheless, after pasting the Backtrace into the Exception decoder then still nothing happens at all...

you may check a downstripped standallone version here (no TFT hardware etc):
https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino

BTW, is it possible to disable this eff*** "Guru"...?

@atanisoft
Copy link
Collaborator

Suffice to say, recursion within tasks is not an easy problem to solve. It's not really a task that is designed to run on an embedded RTOS platform entirely.

However, as noted previously, you can create a task with a larger stack size to run your recursion process.

BTW, is it possible to disable this eff*** "Guru"...?

This is coming from the pre-built ESP-IDF code with the default setting of CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT. You would need to rebuild ESP-IDF code with CONFIG_ESP_SYSTEM_PANIC_SILENT_REBOOT to not have it print the register dump and backtrace, though it may print other details.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

I do not run a recursion in a task.
Neither in my GBox program nor in the downstripped demo version
https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino

But that eff*** Guru error happens at either program - nonetheless, never any runtime errors e.g. on my MEGA or my DUE.

@atanisoft
Copy link
Collaborator

I do not run a recursion in a task.

Both setup() and loop() run in a single RTOS task with 8kb stack.

https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L9 describes using recursion as part of it's algorithm.

Recursion points:

Each level of recursion depth will use at least 175b of stack plus any additional required for making function calls. At some point in the recursion depth it will fail as you have found.

neverany errors e.g. on my MEGA or my DUE.

AVR doesn't use RTOS and doesn't have the same concept of task stack. It uses all free heap/SRAM for the recursion usage, very likely at a certain depth of recursion it will start randomly overwriting areas of SRAM or perhaps simply crash.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

oh, I expected both setup() and loop() are just parts of main(), just like in Arduino, for RTOS then running in the same main() task, with access to the entire RAM :

int main() {
  setup();
  while (true) {
     loop();
  }
}

anyway, recursion is mandatory, and as I do this
as to big GBox: in loop()
as to small demo: in setup(),
how shall I enlarge the setup/loop() task sizes accordingly to maximum? in GBox I have only 1 extra small parallel thread, not of course in the demo - but also the demo crashes though.

@atanisoft
Copy link
Collaborator

https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/main.cpp#L51 is the entrypoint from ESP-IDF (which boots up prior to the app starting).

https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/main.cpp#L67 shows where loopTask is created, you could do something similar in your setup() function where it creates a task for your recursion work with a high stack size. You will need to adjust the stack size for your recursion task until you have the depth of recursion you are after.

You can also call vTaskDelete(NULL) from the end of setup() (or loop() to dispose of loopTask and reclaim the 8kb of heap (from the task stack).

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

I would need to code the available memory size in my program, not by patching the ESP API.
Any suggestions?
Most important for the loop().
I will accordingly then apply that to the reworked demo.
I know the size of the common variables from the compile message, so the rest should be assigned to the loop RAM.

@atanisoft
Copy link
Collaborator

You can call the same APIs used in the links above from your code without altering the arduino-esp32 code.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

I have no clue how to do that, I am just used to programming by the common original Arduino API methods.

updated demo code, chess() running also in setup() now:

https://github.com/dsyleixa/Arduino/tree/master/Chess/chess0049e32

@dsyleixa
Copy link
Author

dsyleixa commented Dec 13, 2021

if I put at the end of setup():
vTaskDelete(NULL);
then I get no Serial output anymore.
So how to get the entire RAM for loop()?

or is it better to put all code into setup() and clear all in loop()?
And how to get all entire RAM then for loop()?

@igrr
Copy link
Member

igrr commented Dec 13, 2021

@me-no-dev @SuGlider It seems like overriding the default value of stack size for the main task (without having to fall back to arduino-as-IDF-component) could be useful.

What do you think about adding a simple way for the user to adjust the main task stack size, something along these lines:

/* in arduino-esp32 main.cpp: */
__attribute__((weak)) size_t getArduinoLoopTaskStackSize(void) {
    return ARDUINO_LOOP_STACK_SIZE;
}

/* later... */
xTaskCreateUniversal(loopTask, "loopTask", getArduinoLoopTaskStackSize(), NULL, 1, &loopTaskHandle, ARDUINO_RUNNING_CORE);

/* in Arduino.h */

#define ESP_LOOP_TASK_STACK_SIZE(sz) \
    size_t getArduinoLoopTaskStackSize(void) { \
        return sz; \
    }
    
/* in sketch code */

#include <Arduino.h>

ESP_LOOP_TASK_STACK_SIZE(16384);

void setup() { }

void loop() { }

Edit: alternatively, as a more general solution, we could consider a user-provided "build options" header file
esp8266/Arduino#8095
stm32duino/Arduino_Core_STM32#1442

@SuGlider SuGlider self-assigned this Dec 13, 2021
@me-no-dev
Copy link
Member

@SuGlider let's align this with @pedrominatel and have it documented as well

@dsyleixa
Copy link
Author

dsyleixa commented Dec 14, 2021

thanks guys for your interest in this topic and for find a fix.
As a 1st step I made some local variables in my recursive functions to global and further decreased the max-deepening count, so for now it admittedly plays with a poor skill but at least doesn't crash no more.

Perhaps allow me to propose a solution: I would tend to define the stack size within the threads myself at the beginning, similar to setting a thread priority, e.g. via
vTaskRamSet (NULL, 4000); // sets RAM size to 4000bytes
(my2ct)
thanks @ALL for your contributions!

We now may close it or keep it open until there is a fix, as you wish.

@atanisoft
Copy link
Collaborator

Perhaps allow me to propose a solution: I would tend to define the stack size within the threads myself at the beginning, similar to setting a thread priority, e.g. via
vTaskRamSet (NULL, 4000); // sets RAM size to 4000bytes
(my2ct)

Unfortunately there is no such function in FreeRTOS, the only time the stack size can be set is during creation. The solution which @igrr has proposed (weak function you can override) is likely the best option as it will work with both IDF+Arduino and Arduino (standalone).

@dsyleixa
Copy link
Author

reworked my basic chess code, both by the identical algorithm, 2 UI versions:
a) 1 for my Raspi to control what happens: https://github.com/dsyleixa/RaspberryPi/blob/master/chess/micromax48005.c
b) 1 for my esp32 (2.0.1): https://github.com/dsyleixa/Arduino/blob/master/Chess/chessesp48005/chessesp48005.ino

first observations:
the number of recursive computations on my ESP32 are far larger than on the Pi (8 ply/5.1 Mio vs 7 ply/1.3 Mio), by identical settings and boundary conditions (and besides, then a different resulting generated move).

the Raspi Xterminal console output for the 1st move, after WHITE manual move d2d4,
and then BLACK auto reply (just press ENTER), is:

>   BLACK:  

 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 67 

 2 ply, searched:         9 
 3 ply, searched:       172 
 4 ply, searched:       966 
 5 ply, searched:      8804 .................
 6 ply, searched:    130720 ...............................................................................................................
 7 ply, searched:   1316247 
score=8000


  1.5: n g8-f6  

whilst the Serial console output of the ESP32 is:

>   BLACK:  
 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 67 

 2 ply, searched:         9 
 3 ply, searched:       172 
 4 ply, searched:       976 
 5 ply, searched:      9364 ..............
 6 ply, searched:    135948 ....................................
                            ....................................
                            .........................
 7 ply, searched:    975694 ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            .........................
 8 ply, searched:   5173849 
score=8000


  1.5: n b8-c6  

...that is really puzzling and IMO that might be a reason for massive unexpected RAM consumptions.... :?:

me-no-dev pushed a commit that referenced this issue Dec 20, 2021
## Summary
Arduino ```setup()``` and ```loop()``` run under a Task with a fixed Stack size of 8KB.
Users may want to change this size.

This PR adds this possibility by just adding a line of code, as for example:
``` dart
ESP_LOOP_TASK_STACK_SIZE(16384);

void setup() { 
}

void loop() { 
}
```
## Impact
None. It adds a new functionality to ESP32 Arduino.
If ```ESP_LOOP_TASK_STACK_SIZE(newSize);``` is not declared/used, it will compile the sketch with the default stack size of 8KB.

## Related links
fix #6010 

#6010 (comment)
Thanks @igrr for the suggestion!
@dsyleixa
Copy link
Author

dsyleixa commented Dec 20, 2021

I actually doubt that the TOP issue is only caused by too little STACK (edited). If it was, then the program wouldn't behave so extremely different from the same program running on a RaspberryPi.

@atanisoft
Copy link
Collaborator

I actually doubt that the TOP issue is only caused by too little RAM.

It's caused by stack exhaustion in the loopTask which you can now configure higher than the default 8kb. This is clearly evident from the backtrace you have provided here as:

Debug exception reason: Stack canary watchpoint triggered (loopTask)

It has nothing to do with free RAM and everything to do with stack.

Comparing to a linux host (rPi) is not a fair comparison since they don't operate in the same fashion.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 20, 2021

I have to disagree as the programs (move generator, Negamax) are totally identical now:
As stated I meanwhile have reworked the code, I get no Core Panics anymore, nonetheless the ESP runs totally different from the Pi, e.g. the ESP calculates 4x as many recursions than the Pi plus 1 extra deepening ply which must not happen.
a) Raspi https://github.com/dsyleixa/RaspberryPi/blob/master/chess/micromax48005.c
b) esp32 (2.0.1): https://github.com/dsyleixa/Arduino/blob/master/Chess/chessesp48005/chessesp48005.ino

@atanisoft
Copy link
Collaborator

I have to disagree as the programs (move generator, Negamax) are totally identical now:

you are free to disagree but the backtrace does not agree with you.

@dsyleixa
Copy link
Author

I don't have a backtrace anymore.

@igrr
Copy link
Member

igrr commented Dec 20, 2021

@dsyleixa I'd recommend checking the code again, it seems that it relies on some undefined behaviors, so its execution is not very predictable.

On Linux (compiled with -fsanitize=undefined option):

(output)
/tmp/chess.c:160:9: runtime error: signed integer overflow: 1688299438 + 2038795618 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:160:9 in 
/tmp/chess.c:161:9: runtime error: signed integer overflow: 984903368 + 1245823260 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:161:9 in 

 2 ply, searched:         9 /tmp/chess.c:161:35: runtime error: signed integer overflow: 1532164499 - -635096640 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:161:35 in 
/tmp/chess.c:160:35: runtime error: signed integer overflow: -1827237850 - 718599499 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:160:35 in 

 3 ply, searched:       172 /tmp/chess.c:114:9: runtime error: signed integer overflow: 1273382264 - -991284065 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:114:9 in 

Integer overflow is also reported on ESP32, if we add -fsanitize=undefined compiler option:

(output)
>  WHITE:  DEBUG cstring : d2d4
 DEBUG K: 99  
 DEBUG L: 67 

Undefined behavior of type sub_overflow


Backtrace:0x40081dd9:0x3ffb24d00x40087d6d:0x3ffb24f0 0x40087db7:0x3ffb2510 0x40087ddd:0x3ffb2570 0x400d6aa3:0x3ffb2590 0x400d79d3:0x3ffb26f0 0x400d7cf6:0x3ffb27f0 0x400d8812:0x3ffb2820 
0x40081dd9: panic_abort at /Users/ivan/e/esp-idf/components/esp_system/panic.c:402

0x40087d6d: esp_system_abort at /Users/ivan/e/esp-idf/components/esp_system/esp_system.c:121

0x40087db7: __ubsan_default_handler at /Users/ivan/e/esp-idf/components/esp_system/ubsan.c:166

0x40087ddd: __ubsan_handle_sub_overflow at /Users/ivan/e/esp-idf/components/esp_system/ubsan.c:196

0x400d6aa3: Minimax(int, int, int, int, int, int) at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:159

0x400d79d3: chess() at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:316

0x400d7cf6: setup() at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:381

0x400d8812: loopTask(void*) at /Users/ivan/e/arduino-esp32/cores/esp32/main.cpp:38

Besides, compiling this code on Linux with -Wall -Werror flags reveals a bunch of possible issues related to operator precedence, please check them as well:

(compiler output)
/tmp/chess.c:115:11: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
  !(m<=q|X&8&&m>=l|X&S))                        //   or window incompatible  
        ~~^~
/tmp/chess.c:115:11: note: place parentheses around the '&' expression to silence this warning
  !(m<=q|X&8&&m>=l|X&S))                        //   or window incompatible  
          ^
         (  )
/tmp/chess.c:115:21: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
  !(m<=q|X&8&&m>=l|X&S))                        //   or window incompatible  
                  ~~^~
/tmp/chess.c:115:21: note: place parentheses around the '&' expression to silence this warning
  !(m<=q|X&8&&m>=l|X&S))                        //   or window incompatible  
                    ^
                   (  )
/tmp/chess.c:120:5: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses]
   z&K==I&&(N<1e6&d<98||                        // root: deepen upto time    
    ^~~~~
/tmp/chess.c:120:5: note: place parentheses around the '==' expression to silence this warning
   z&K==I&&(N<1e6&d<98||                        // root: deepen upto time    
    ^
     (   )
/tmp/chess.c:120:5: note: place parentheses around the & expression to evaluate it first
   z&K==I&&(N<1e6&d<98||                        // root: deepen upto time    
    ^
   (  )
/tmp/chess.c:120:10: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses]
   z&K==I&&(N<1e6&d<98||                        // root: deepen upto time    
   ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/chess.c:120:10: note: place parentheses around the '&&' expression to silence this warning
   z&K==I&&(N<1e6&d<98||                        // root: deepen upto time    
         ^
   (
/tmp/chess.c:125:14: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
  m=-P<l|R>35?d>2?-I:e:-P;                      // Prune or stand-pat        
    ~~~~~~~~~^
/tmp/chess.c:125:14: note: place parentheses around the '|' expression to silence this warning
  m=-P<l|R>35?d>2?-I:e:-P;                      // Prune or stand-pat        
             ^
    (        )
/tmp/chess.c:125:14: note: place parentheses around the '?:' expression to evaluate it first
  m=-P<l|R>35?d>2?-I:e:-P;                      // Prune or stand-pat        
             ^
         (               )
/tmp/chess.c:131:20: error: operator '?:' has lower precedence than '&'; '&' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
            ~~~~~~~^
/tmp/chess.c:131:20: note: place parentheses around the '&' expression to silence this warning
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
                   ^
            (      )
/tmp/chess.c:131:20: note: place parentheses around the '?:' expression to evaluate it first
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
                   ^
                (             )
/tmp/chess.c:131:12: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses]
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
          ~^~~~~~~~~~~~~~~~~~~
/tmp/chess.c:131:12: note: place parentheses around the assignment to silence this warning
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
           ^
          (                   )
/tmp/chess.c:131:12: note: use '==' to turn this assignment into an equality comparison
    while(r=p>2&r<0?-r:-o[++j])                     // loop over directions o[]  
           ^
           ==
/tmp/chess.c:140:31: error: & has lower precedence than <; < will be evaluated first [-Werror,-Wparentheses]
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                           ~~~^
/tmp/chess.c:140:31: note: place parentheses around the '<' expression to silence this warning
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                              ^
                           (  )
/tmp/chess.c:140:31: note: place parentheses around the & expression to evaluate it first
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                              ^
                             (            )
/tmp/chess.c:140:22: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                    ~^~~~~~
/tmp/chess.c:140:22: note: place parentheses around the '&' expression to silence this warning
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                     ^
                    (     )
/tmp/chess.c:140:31: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                          ~~~~^~~~~~~~~~~~
/tmp/chess.c:140:31: note: place parentheses around the '&' expression to silence this warning
      t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break;      // capt. own, bad pawn mode  
                              ^
                           (              )
/tmp/chess.c:150:14: error: | has lower precedence than >; > will be evaluated first [-Werror,-Wparentheses]
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
             ^~~~~
/tmp/chess.c:150:14: note: place parentheses around the '>' expression to silence this warning
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
             ^
              (   )
/tmp/chess.c:150:14: note: place parentheses around the | expression to evaluate it first
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
             ^
          (    )
/tmp/chess.c:150:19: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
          ~~~~~~~~^
/tmp/chess.c:150:19: note: place parentheses around the '|' expression to silence this warning
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
                  ^
          (       )
/tmp/chess.c:150:19: note: place parentheses around the '?:' expression to evaluate it first
       v-=p-4|R>29?0:20;                        // penalize mid-game K move  
                  ^
              (        )
/tmp/chess.c:165:18: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
        s=C>2|v>V?-Minimax(-l,-V,-v,                  // recursive eval. of reply  
          ~~~~~~~^
/tmp/chess.c:165:18: note: place parentheses around the '|' expression to silence this warning
        s=C>2|v>V?-Minimax(-l,-V,-v,                  // recursive eval. of reply  
                 ^
          (      )
/tmp/chess.c:165:18: note: place parentheses around the '?:' expression to evaluate it first
        s=C>2|v>V?-Minimax(-l,-V,-v,                  // recursive eval. of reply  
                 ^
              (
/tmp/chess.c:177:21: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
       m=v,X=x,Y=y|S&F;                         // mark double move with S   
                  ~~^~
/tmp/chess.c:177:21: note: place parentheses around the '&' expression to silence this warning
       m=v,X=x,Y=y|S&F;                         // mark double move with S   
                    ^
                   (  )
/tmp/chess.c:179:17: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
      if(x+r-y|u&32|                            // not 1st step,moved before 
              ~~^~~
/tmp/chess.c:179:17: note: place parentheses around the '&' expression to silence this warning
      if(x+r-y|u&32|                            // not 1st step,moved before 
                ^
               (   )
/tmp/chess.c:181:26: error: '&' within '^' [-Werror,-Wbitwise-op-parentheses]
         board[G=x+3^r>>1&7]-turn-6                    // no virgin R in corner G,  
                    ~~~~~^~
/tmp/chess.c:181:26: note: place parentheses around the '&' expression to silence this warning
         board[G=x+3^r>>1&7]-turn-6                    // no virgin R in corner G,  
                         ^
                     (     )
/tmp/chess.c:180:13: error: & has lower precedence than >; > will be evaluated first [-Werror,-Wparentheses]
         p>2&(p-4|j-7||                         // no P & no lateral K move, 
         ~~~^
/tmp/chess.c:180:13: note: place parentheses around the '>' expression to silence this warning
         p>2&(p-4|j-7||                         // no P & no lateral K move, 
            ^
         (  )
/tmp/chess.c:180:13: note: place parentheses around the & expression to evaluate it first
         p>2&(p-4|j-7||                         // no P & no lateral K move, 
            ^
           (
/tmp/chess.c:180:13: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses]
         p>2&(p-4|j-7||                         // no P & no lateral K move, 
         ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/chess.c:180:13: note: place parentheses around the '&' expression to silence this warning
         p>2&(p-4|j-7||                         // no P & no lateral K move, 
            ^
         (
/tmp/chess.c:191:8: error: | has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses]
  m=m+I|P==I?m:0;                               // best loses K: (stale)mate 
       ^~~~~
/tmp/chess.c:191:8: note: place parentheses around the '==' expression to silence this warning
  m=m+I|P==I?m:0;                               // best loses K: (stale)mate 
       ^
        (   )
/tmp/chess.c:191:8: note: place parentheses around the | expression to evaluate it first
  m=m+I|P==I?m:0;                               // best loses K: (stale)mate 
       ^
    (    )
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

In general, if you see that a certain non-platform-specific piece of code works differently on Linux and on a microcontroller, first try to make sure it compiles and works correctly on Linux with -Wall -Werror -fsanitize=address -fsanitize=undefined compiler flags. Then if the difference is still present, apply "divide and conquer" approach — bisect the application to narrow down the place where the behavior differs between the two platforms.

If you narrow the issue down to a small fragment of code (MCVE) which still works differently on Linux and ESP while passing compiler and sanitizer checks, please post that fragment of code here, we will try to help you figure out the issue.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 20, 2021

On Raspi and ESP32 and original Arduino it's always compiled by gcc, and operator precedence for C/C++ hasn't changed since C99 or even before.
As to the Linux warnings (-Wall): this issue has been asked by me and discussed in the Raspi forum - but they said e.g., "The suggestions about using brackets are just that, suggestions. Gcc is just saying that adding brackets will make it easier for anybody reading the code to see what the expressions are and less likely for the programmer (or others trying to modify the code at a later date) to make a mistake. " (ref.: https://forums.raspberrypi.com/viewtopic.php?t=325912&p=1951822&sid=b280b91b565b3f9373cfe7556bbf3875#p1950861)
But I agree that there must happen some undefined behaviour, because even the 1st code from the TOP always worked fine on AVR and ARM Cortex whilst it crashed on ESP32.
And BTW, the same code also runs even on an UNO, ported by another author, and it also runs like a charm:
https://create.arduino.cc/projecthub/rom3/arduino-uno-micromax-chess-030d7c

@dsyleixa
Copy link
Author

as to signed int overflow: I don't see any signed ints in my code, just int and int32_t (CMIIW)

@dsyleixa
Copy link
Author

dsyleixa commented Dec 20, 2021

as the error happens in a code of multiple recursions (which always are computed in identical follow-up series though) and by admittedly multiple recursive stack allocations (correct idiom?) this code cannot be shrinked down unfortunately.
Nonetheless, it works on Raspi and Arduino AVR and Arduino ARM Cortex, but it works errorneous, fails, or even crashes on ESP32.

@igrr
Copy link
Member

igrr commented Dec 21, 2021

FWIW, your Raspberry Pi version of the code produces same result for me on the ESP32 as it does on Linux. I only had to replace the platform-dependent rand() call with a simple LCG (chess_rand). See the updated code and the output I get here. To make testing simpler I've hardcoded the 3 commands into the app — "d2d4\n", "\n", "Q" — see moves array. The code should run with IDF and Arduino on the ESP32, as well as on Linux.

@dsyleixa
Copy link
Author

dsyleixa commented Dec 21, 2021

that is amazing, thank you very much for your contributions! Now these results have now dispelled all of my concerns finally. I would never have considered that random initializing a hashtable for known positions could lead to these different results.
Again, thank you a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants