Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to trace where watchdog reboots on bl602 #1322

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

giedriuslt
Copy link
Contributor

Traces all the main loop to serial console to try to catch why bl602 reboots

@diepeterpan
Copy link

Thank you, I am running it and will give feedback as soon as I catch one.

e.g. example of [TRACE] seen in log

[TRACE] After scheduled driver starts
[TRACE] Before HAL_PrintNetworkInfo
[TRACE] After HAL_PrintNetworkInfo
[TRACE] After watchdog reset
Info:MAIN:Time 760, idle 0/s, free 95336, MQTT 1(2), bWifi 1, second[TRACE] Exiting OnEverySecond
sWithNoPing -1, socks 2/21 
-----------------> AABA Request:
    A-MSDU: Permitted
    Block Ack Policy: Immediate Block Ack
    TID: 0
    Number of Buffers: 64
-----------------> AABA Response:
    A-MSDU: Not Permitted
    Block Ack Policy: Immediate Block Ack
    TID: 0
    Number of Buffers: 8
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MAIN:Time 761, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
[TRACE] Exiting OnEverySecond
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate

@diepeterpan
Copy link

diepeterpan commented Aug 20, 2024

I captured numerous, see below.

Info:MAIN:Time 3928, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝


------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 19 2024
Build Time:         19:03:24
Boot Reason:        BL_RST_SOFTWARE_WATCHDOG
Info:MAIN:Time 5659, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
[TRACE] Exiting OnEverySecond
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝


------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 19 2024
Build Time:         19:03:24
Boot Reason:        BL_RST_SOFTWARE_WATCHDOG
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
[TRACE] Exiting OnEverySecond
Info:MAIN:Time 834, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝


------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 19 2024
Build Time:         19:03:24
Boot Reason:        BL_RST_SOFTWARE_WATCHDOG
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MQTT:Publishing val FFFFFF to obl94F28B35/led_basecolor_rgb/get retain=0
Info:MAIN:Time 445, idle 0/s, free 95336, MQTT 1[TRACE] Exiting OnEverySecond
(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_basecolor_rgb/get
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MQTT:Publishing val 100 to obl94F28B35/led_dimmer/get retain=0
Info:MA[TRACE] Exiting OnEverySecond
IN:Time 446, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_dimmer/get
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp 
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝


------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 19 2024
Build Time:         19:03:24
Boot Reason:        BL_RST_SOFTWARE_WATCHDOG

@giedriuslt
Copy link
Contributor Author

giedriuslt commented Aug 20, 2024

Okay, so reading temperature causes this. There is apparently a possibility to wait indefinitely during temp read here https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c#L1199 . That causes the issue. I guess we should not read temperature of bl602 by default.

@giedriuslt giedriuslt changed the title Try to trace where wathdog reboots on bl602 Try to trace where watchdog reboots on bl602 Aug 20, 2024
@MaxineMuster
Copy link
Contributor

MaxineMuster commented Aug 20, 2024

Couldn't we just limit the number of tries for the loop and skip else?

Like (not tested, just wrote down and the number should be adjusted)

    ADC_Start();
    int t=100;
    while (ADC_Get_FIFO_Count() == 0 && t-- > 0)
          ;
     if ( t <= 0 ) return -999;
    regVal = ADC_Read_FIFO();

You probably saw it: this "while" loop fragment is called twice, for low and high...

@giedriuslt
Copy link
Contributor Author

Could be done, but that requires modifying sdk. For now I disabled temp reading, lets see if it helps.

@diepeterpan
Copy link

Could be done, but that requires modifying sdk. For now I disabled temp reading, lets see if it helps.

Installed the build, can see 0.0 temp, so def. not reading the temp anymore, will report back on stability and any reboots.

@diepeterpan
Copy link

12 hours and counting, and thus far no reboot problems.

Observation: The device feels much more responsive, the temp. reading code in the SDK is prob. CPU intensive and it is called frequently.

Proposal: Copy the SDK code to your code and don't change the SDK, thus creating a similar, more efficient function without the endless loop. And secondly, don't update the temp. that frequently, say once every 10 sec should be efficient.

@MaxineMuster
Copy link
Contributor

There is another implementation of a "TSEN_Get_Temp" (with different arguments) here:

https://github.com/bouffalolab/bouffalo_sdk/blob/master/drivers/lhal/src/bflb_adc.c#L744

Its waiting up to 100 ms for temperature data:

    bflb_adc_start_conversion(dev);
    start_time = bflb_mtimer_get_time_ms();
    while (bflb_adc_get_count(dev) == 0) {
        if ((bflb_mtimer_get_time_ms() - start_time) > 100) {
            return -ETIMEDOUT;
        }
    }

Note: e.g. inside "void bflb_update_adc_trim(struct bflb_device_s *dev, const struct bflb_adc_config_s *config" there is also an unlimited loop

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Aug 21, 2024

I tried to make a version with the changes below:
dev_20240821_115931.zip

(since I don't have a BL602, I can't test and there is a slight danger, this version won't work at all)

In SDK

diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..60ab59a6 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,7 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_Result_Type result;
     uint32_t tmpVal;
     uint8_t gainCalEnabled=0;
+    uint64_t start_time;
 
     /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
     tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1197,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_LOW();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1221,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_HIGH();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
     adcGainCoeffCal.adcGainCoeffEnable=0;

re-enabled temperature reads every five seconds

diff --git a/src/user_main.c b/src/user_main.c
index fdbff487..2a39e65c 100644
--- a/src/user_main.c
+++ b/src/user_main.c
@@ -502,8 +502,8 @@ void Main_OnEverySecond()
                g_wifi_temperature = temperature * 0.128f;
 #endif
 #elif PLATFORM_BL602
-               //do not read temp, sometimes it's to slow and causes reboot
-               //get_tsen_adc(&g_wifi_temperature, 0);
+               // get temp every 5 seconds
+               if (g_secondsElapsed % 5 ==0) get_tsen_adc(&g_wifi_temperature, 0);
 #elif PLATFORM_LN882H
                // this is set externally, I am just leaving comment here
 #endif

@diepeterpan
Copy link

diepeterpan commented Aug 21, 2024

I tried to make a version with the changes below: dev_20240821_115931.zip

(since I don't have a BL602, I can't test and there is a slight danger, this version won't work at all)

In SDK

diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..60ab59a6 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,7 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_Result_Type result;
     uint32_t tmpVal;
     uint8_t gainCalEnabled=0;
+    uint64_t start_time;
 
     /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
     tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1197,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_LOW();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1221,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_HIGH();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
     adcGainCoeffCal.adcGainCoeffEnable=0;

re-enabled temperature reads every five seconds

diff --git a/src/user_main.c b/src/user_main.c
index fdbff487..2a39e65c 100644
--- a/src/user_main.c
+++ b/src/user_main.c
@@ -502,8 +502,8 @@ void Main_OnEverySecond()
                g_wifi_temperature = temperature * 0.128f;
 #endif
 #elif PLATFORM_BL602
-               //do not read temp, sometimes it's to slow and causes reboot
-               //get_tsen_adc(&g_wifi_temperature, 0);
+               // get temp every 5 seconds
+               if (g_secondsElapsed % 5 ==0) get_tsen_adc(&g_wifi_temperature, 0);
 #elif PLATFORM_LN882H
                // this is set externally, I am just leaving comment here
 #endif

@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where? I am considering loading it onto my 'bench' BL602 which has serial logging and wires soldered to flash incase it fails, @giedriuslt what do you think? I have also installed @giedriuslt trace version on a live BL602 bulb currently at 18 hours uptime.

PS. Anyone else tracing and testing?

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Aug 21, 2024

@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where?

Yes, my test-version is based on this PR, so including all the "[TRACE]..." messages

[EDIT]:
@giedriuslt if you replace printf("[TRACE]... with bk_printf("[TRACE]... it should also compile for BEKEN platform (your work might be useful in the future there, too).
Though I don't see at a first glance, why this leads to undefined reference to `addLog'...

@diepeterpan
Copy link

diepeterpan commented Aug 21, 2024

@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where?

Yes, my test-version is based on this PR, so including all the "[TRACE]..." messages

[EDIT]: @giedriuslt if you replace printf("[TRACE]... with bk_printf("[TRACE]... it should also compile for BEKEN platform (your work might be useful in the future there, too). Though I don't see at a first glance, why this leads to undefined reference to `addLog'...

Ok, installed @MaxineMuster version on my 'bench' BL602. Let's see how it goes. @giedriuslt Version still running on live bulb with 21 hours uptime.

EDIT: 24 hours and counting on @giedriuslt build without temperature measurement.

@restoration2699
Copy link

I can test dev firmware, but I can't get log by serial from device.
If it will be usefull, I can tell if will get reboot.

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting

@MaxineMuster
Copy link
Contributor

BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting

I'm curious: Is temperature reading working? Are there some -999° readings (in this simple test they are not disgarded as invalid)?

@restoration2699
Copy link

I'm curious: Is temperature reading working?

in web gui reading works. and it looks like reading correct value.

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting

I'm curious: Is temperature reading working? Are there some -999° readings (in this simple test they are not disgarded as invalid)?

Have not yet seen a -999, otherwise it's working fine.

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting

BENCH BL602 with serial logging using @MaxineMuster version: 34 hours no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 58 hours no BL_RST and counting

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting

BENCH BL602 with serial logging using @MaxineMuster version: 34 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 58 hours no BL_RST and counting

BENCH BL602 with serial logging using @MaxineMuster version: 2.5 days no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 3.5 days no BL_RST and counting

Seems like this fix the BL602 stability, at least for my devices (Light bulbs).

@restoration2699
Copy link

on version 655, I got a reboot on day 5. So it's too early to draw conclusions.

@MaxineMuster
Copy link
Contributor

on version 655

Sorry to ask, but, did you try the "release" version or the firmware offered in this PR?

@restoration2699
Copy link

Sorry to ask, but, did you try the "release" version or the firmware offered in this PR?

yep.
I mean release version.

@MaxineMuster
Copy link
Contributor

O.k., the fix tested here is not yet in release version, but only in the versions offered here.
So your watchdog reset could still be caused by the infinite loop in temperature reading ...

@restoration2699
Copy link

So your watchdog reset could still be caused by the infinite loop in temperature reading

for now I'm running your version posted here.

I mean on 655 release version first watchdog reboot took a lot of time. about 6 days.

@giedriuslt
Copy link
Contributor Author

Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.

[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
[TRACE] Exiting OnEverySecond
Info:MAIN:Time 393588, idle 0/s, free 91848, MQTT 1(10), bWifi 1, secondsWithNoPing 393517, socks 4/21 
Starting bl602 now....
Booting BL602 Chip...

@restoration2699
Copy link

Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.

temp. reading works on release versions, but as I understood it may cause watchdog reboot.

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 4 days no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 5 days no BL_RST and counting

It's the most stable these light bulbs have been on OpenBeken firmware. But let's see how far I can push my luck, def. an improvement so far.

@diepeterpan
Copy link

diepeterpan commented Aug 25, 2024

Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.

[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
[TRACE] Exiting OnEverySecond
Info:MAIN:Time 393588, idle 0/s, free 91848, MQTT 1(10), bWifi 1, secondsWithNoPing 393517, socks 4/21 
Starting bl602 now....
Booting BL602 Chip...

What is the reason for the reboot? Watchdog? I'm curious if there's some other adc call been made that uses one of those funky endless loops.

image

@giedriuslt
Copy link
Contributor Author

You might be on to something here. WiFi driver is potentially reading temp for calibration purposes the same way. https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_wifidrv/bl60x_wifi_driver/wifi_mgmr.c#L997
That could cause the same issue. Although I never got reboots from normal temp reading.... I'll try to log this

@giedriuslt
Copy link
Contributor Author

Well, no, this temperature calibration is disabled by default. So something else caused my reboot

@restoration2699
Copy link

8.5 days online on this firmware

I tried to make a version with the changes below:
dev_20240821_115931.zip

@diepeterpan
Copy link

BENCH BL602 with serial logging using @MaxineMuster version: 4 days no BL_RST and counting
LIVE BL602 without serial logging using @giedriuslt version: 5 days no BL_RST and counting

BENCH BL602 with serial logging using @MaxineMuster version: 9.5 days no BL_RST_SOFTWARE_WATCHDOG and counting
LIVE BL602 without serial logging using @giedriuslt version: 10.5 days no BL_RST_SOFTWARE_WATCHDOG and counting

@giedriuslt
Copy link
Contributor Author

Great, I think we should merge some version of this fix.
One of my device has now 13 days uptime (without this fix, but it never suffered).
Other has 4.5 days, and previously got some different non-temp related watchdog reboot

@giedriuslt
Copy link
Contributor Author

Got another reboot, again outside of main loop. Other socket has 14 days uptime now 🤔

@diepeterpan
Copy link

Great, I think we should merge some version of this fix. One of my device has now 13 days uptime (without this fix, but it never suffered). Other has 4.5 days, and previously got some different non-temp related watchdog reboot

I agree that some version of this fix can be merged, on my LIVE BL602 light bulb without serial logging using @giedriuslt version without temperature reporting I now have 16 days and no BL_RST_SOFTWARE_WATCHDOG and counting.

I am considering installing @MaxineMuster version on the live BL602 light bulb and see if I also get 16+ days.

@restoration2699
Copy link

16.5 days online on @MaxineMuster firmware

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Sep 7, 2024

Maybe a bit late, but I would like to offer another version.
The idea/reason: Even if temperature is requested only every 5 seconds, inside the loop we will call bflb_platform_get_time_ms() on every iteration (and I don't know the impact). If this is not critical, just ignore this new approach.

This version will just count the number of loops checking the FIFO, stopping after 1000.

dev_20240907_135546.zip

The (not relevant) changes: in user_main I replaced printf() with bk_printf(), so it should compile for other platforms, too:

diff_20240907_1600.txt

The relevant change for BL602 in sdk:

diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..ff2b079a 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_Result_Type result;
     uint32_t tmpVal;
     uint8_t gainCalEnabled=0;
+//    uint64_t start_time;
+    uint16_t counter;
 
     /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
     tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1198,21 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_LOW();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ >  1000) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1230,21 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_HIGH();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ > 1000) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
     adcGainCoeffCal.adcGainCoeffEnable=0;
@@ -1232,6 +1260,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     {
         temp = (((float)v1 - (float)v0) - (float)tsen_offset) / 7.753;
     }
+    // print value of counter to see, if loop limit of 1000 is a reasonalbe choice 
+    printf("DEBUG: inside 'TSEN_Get_Temp()' - counter=%i\r\n",counter);
 
     return temp;
 }

@MaxineMuster
Copy link
Contributor

Got feedback from @diepeterpan (thanks for giving it a try), that getting temperature fails.
I changed the counter from 1.000 to 100.000 and added some printf in case of quitting the loop:

dev_20240908_122354.zip

diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..64775693 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_Result_Type result;
     uint32_t tmpVal;
     uint8_t gainCalEnabled=0;
+//    uint64_t start_time;
+    uint32_t counter;
 
     /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
     tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1198,22 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_LOW();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ >  100000) {
+            printf("DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_LOW loop failed: counter=%li\r\n",counter);
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1231,22 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_HIGH();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ > 100000) {
+            printf("DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=%li\r\n",counter);
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
     adcGainCoeffCal.adcGainCoeffEnable=0;
@@ -1232,6 +1262,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     {
         temp = (((float)v1 - (float)v0) - (float)tsen_offset) / 7.753;
     }
+    // print value of counter to see, if loop limit of 100000 is a reasonalbe choice 
+    printf("DEBUG: inside 'TSEN_Get_Temp()' - counter=%li\r\n",counter);
 
     return temp;
 }

@diepeterpan
Copy link

diepeterpan commented Sep 8, 2024

Maybe a bit late, but I would like to offer another version. The idea/reason: Even if temperature is requested only every 5 seconds, inside the loop we will call bflb_platform_get_time_ms() on every iteration (and I don't know the impact). If this is not critical, just ignore this new approach.

This version will just count the number of loops checking the FIFO, stopping after 1000.

dev_20240907_135546.zip

The (not relevant) changes: in user_main I replaced printf() with bk_printf(), so it should compile for other platforms, too:

diff_20240907_1600.txt

The relevant change for BL602 in sdk:

diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..ff2b079a 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_Result_Type result;
     uint32_t tmpVal;
     uint8_t gainCalEnabled=0;
+//    uint64_t start_time;
+    uint16_t counter;
 
     /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
     tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1198,21 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_LOW();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ >  1000) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1230,21 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     ADC_SET_TSVBE_HIGH();
 
     ADC_Start();
-    while (ADC_Get_FIFO_Count() == 0)
-        ;
+/*    // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+    start_time = bflb_platform_get_time_ms();
+    while (ADC_Get_FIFO_Count() == 0){
+        if ((bflb_platform_get_time_ms() - start_time) > 100) {
+            return -999;
+        }
+    }
+*/
+    // another try w/o calling "bflb_platform_get_time_ms()" every time checking FIFO  
+    counter = 0;
+    while (ADC_Get_FIFO_Count() == 0){
+        if (counter++ > 1000) {
+            return -999;
+        }
+    }
     regVal = ADC_Read_FIFO();
     gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
     adcGainCoeffCal.adcGainCoeffEnable=0;
@@ -1232,6 +1260,8 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
     {
         temp = (((float)v1 - (float)v0) - (float)tsen_offset) / 7.753;
     }
+    // print value of counter to see, if loop limit of 1000 is a reasonalbe choice 
+    printf("DEBUG: inside 'TSEN_Get_Temp()' - counter=%i\r\n",counter);
 
     return temp;
 }

This was very confusing, I struggled to get dev_20240907_135546.zip installed. Are we sure dev_20240907_135546.zip is correct as the build date time once installed identify differently and it only shows -999.9 for the temp.

Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝


------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 21 2024
Build Time:         11:38:42
Boot Reason:        BL_RST_POWER_OFF
------------------------------------------------------------

image

7 Sept file contains a date for 21 Aug, so I did install the correct version. Anyway, let me know if you need me to test anything else.

image

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Sep 8, 2024

I just did a minor change to "user_main.c" (deleted an unused blank). Maybe this will give the correct date, for the last versions didn't change OpenBeken code but just SDK code?

dev_20240908_124416.zip

@@ -488,7 +488,7 @@ void Main_OnEverySecond()
 #ifdef WINDOWS
        g_bHasWiFiConnected = 1;
 #endif
-       bk_printf("[TRACE] Before reading temp \n");
+       bk_printf("[TRACE] Before reading temp\n");
        // display temperature - thanks to giedriuslt
 // only in Normal mode, and if boot is not failing
        if (!bSafeMode && g_bootFailures <= 1)

EDIT: If I got it right, this date refers not to the actual date, where the image was built, but to the GIT macros __DATE__ and __TIME__, which are related to the status of the git (and I just compiled the image and didn't commit anything)

const char* g_build_str = "Built on " __DATE__ " " __TIME__ " version " USER_SW_VER; // Show GIT version at Build line;

@diepeterpan
Copy link

I just did a minor change to "user_main.c" (deleted an unused blank). Maybe this will give the correct date, for the last versions didn't change OpenBeken code but just SDK code?

dev_20240908_124416.zip

@@ -488,7 +488,7 @@ void Main_OnEverySecond()
 #ifdef WINDOWS
        g_bHasWiFiConnected = 1;
 #endif
-       bk_printf("[TRACE] Before reading temp \n");
+       bk_printf("[TRACE] Before reading temp\n");
        // display temperature - thanks to giedriuslt
 // only in Normal mode, and if boot is not failing
        if (!bSafeMode && g_bootFailures <= 1)

OK, I installed the latest version and see the temp, we just looking for stability or anything specific from the serial log?

image

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Sep 8, 2024

Thanks, yes, I'm curious about number of times the loop was called.
Could you please look after the value of "counter" in the lines DEBUG: inside 'TSEN_Get_Temp()' - counter= (in the serial log)?

@diepeterpan
Copy link

diepeterpan commented Sep 8, 2024

Some output containing DEBUG, seems to be on avg. about 5300 times.

DEBUG: inside 'TSEN_Get_Temp()' - counter=5409
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5424
DEBUG: inside 'TSEN_Get_Temp()' - counter=4399
DEBUG: inside 'TSEN_Get_Temp()' - counter=5451
DEBUG: inside 'TSEN_Get_Temp()' - counter=5347
DEBUG: inside 'TSEN_Get_Temp()' - counter=5392
DEBUG: inside 'TSEN_Get_Temp()' - counter=5452
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5466
DEBUG: inside 'TSEN_Get_Temp()' - counter=5360
DEBUG: inside 'TSEN_Get_Temp()' - counter=5457
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5466
DEBUG: inside 'TSEN_Get_Temp()' - counter=5377
DEBUG: inside 'TSEN_Get_Temp()' - counter=5453
DEBUG: inside 'TSEN_Get_Temp()' - counter=5322
DEBUG: inside 'TSEN_Get_Temp()' - counter=5284
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5360
DEBUG: inside 'TSEN_Get_Temp()' - counter=4859
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5278
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5441
DEBUG: inside 'TSEN_Get_Temp()' - counter=5407
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5302
DEBUG: inside 'TSEN_Get_Temp()' - counter=5389
DEBUG: inside 'TSEN_Get_Temp()' - counter=5466
DEBUG: inside 'TSEN_Get_Temp()' - counter=5387
DEBUG: inside 'TSEN_Get_Temp()' - counter=5331
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5163
DEBUG: inside 'TSEN_Get_Temp()' - counter=5360
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5343
DEBUG: inside 'TSEN_Get_Temp()' - counter=5266
DEBUG: inside 'TSEN_Get_Temp()' - counter=5457
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5462
DEBUG: inside 'TSEN_Get_Temp()' - counter=5305
DEBUG: inside 'TSEN_Get_Temp()' - counter=5402
DEBUG: inside 'TSEN_Get_Temp()' - counter=5376
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5381
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5362
DEBUG: inside 'TSEN_Get_Temp()' - counter=5396
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5375
DEBUG: inside 'TSEN_Get_Temp()' - counter=5360
DEBUG: inside 'TSEN_Get_Temp()' - counter=5374
DEBUG: inside 'TSEN_Get_Temp()' - counter=5408
DEBUG: inside 'TSEN_Get_Temp()' - counter=5358
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=5348
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=3819
DEBUG: inside 'TSEN_Get_Temp()' - counter=5379
DEBUG: inside 'TSEN_Get_Temp()' - counter=5472
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=3282
DEBUG: inside 'TSEN_Get_Temp()' - counter=4524
DEBUG: inside 'TSEN_Get_Temp()' - counter=3867
DEBUG: inside 'TSEN_Get_Temp()' - counter=5448
DEBUG: inside 'TSEN_Get_Temp()' - counter=5452
DEBUG: inside 'TSEN_Get_Temp()' - counter=5244
DEBUG: inside 'TSEN_Get_Temp()' - counter=5411
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=5277
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5395
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5387
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5358
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5453
DEBUG: inside 'TSEN_Get_Temp()' - counter=5387
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5399
DEBUG: inside 'TSEN_Get_Temp()' - counter=5451
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5470
DEBUG: inside 'TSEN_Get_Temp()' - counter=5351
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5354
DEBUG: inside 'TSEN_Get_Temp()' - counter=5233
DEBUG: inside 'TSEN_Get_Temp()' - counter=5461
DEBUG: inside 'TSEN_Get_Temp()' - counter=5295
DEBUG: inside 'TSEN_Get_Temp()' - counter=5430
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5383
DEBUG: inside 'TSEN_Get_Temp()' - counter=5355
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5453
DEBUG: inside 'TSEN_Get_Temp()' - counter=5392
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5472
DEBUG: inside 'TSEN_Get_Temp()' - counter=5362
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5412
DEBUG: inside 'TSEN_Get_Temp()' - counter=5450
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5366
DEBUG: inside 'TSEN_Get_Temp()' - counter=5366
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=5341
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5113
DEBUG: inside 'TSEN_Get_Temp()' - counter=5311
DEBUG: inside 'TSEN_Get_Temp()' - counter=2670
DEBUG: inside 'TSEN_Get_Temp()' - counter=5433
DEBUG: inside 'TSEN_Get_Temp()' - counter=5347
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5372
DEBUG: inside 'TSEN_Get_Temp()' - counter=5428
DEBUG: inside 'TSEN_Get_Temp()' - counter=5452
DEBUG: inside 'TSEN_Get_Temp()' - counter=5403
DEBUG: inside 'TSEN_Get_Temp()' - counter=5395
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5359
DEBUG: inside 'TSEN_Get_Temp()' - counter=5368
DEBUG: inside 'TSEN_Get_Temp()' - counter=5457
DEBUG: inside 'TSEN_Get_Temp()' - counter=5415
DEBUG: inside 'TSEN_Get_Temp()' - counter=5382
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5342
DEBUG: inside 'TSEN_Get_Temp()' - counter=5378
DEBUG: inside 'TSEN_Get_Temp()' - counter=1837
DEBUG: inside 'TSEN_Get_Temp()' - counter=5343
DEBUG: inside 'TSEN_Get_Temp()' - counter=5428
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5392
DEBUG: inside 'TSEN_Get_Temp()' - counter=5345
DEBUG: inside 'TSEN_Get_Temp()' - counter=5462
DEBUG: inside 'TSEN_Get_Temp()' - counter=5351
DEBUG: inside 'TSEN_Get_Temp()' - counter=5450
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=5413
DEBUG: inside 'TSEN_Get_Temp()' - counter=5366
DEBUG: inside 'TSEN_Get_Temp()' - counter=5461
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5380
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5349
DEBUG: inside 'TSEN_Get_Temp()' - counter=5367
DEBUG: inside 'TSEN_Get_Temp()' - counter=5461
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5424
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5299
DEBUG: inside 'TSEN_Get_Temp()' - counter=5301
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=5451
DEBUG: inside 'TSEN_Get_Temp()' - counter=5396
DEBUG: inside 'TSEN_Get_Temp()' - counter=5400
DEBUG: inside 'TSEN_Get_Temp()' - counter=5358
DEBUG: inside 'TSEN_Get_Temp()' - counter=5298
DEBUG: inside 'TSEN_Get_Temp()' - counter=5346
DEBUG: inside 'TSEN_Get_Temp()' - counter=5377
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5378
DEBUG: inside 'TSEN_Get_Temp()' - counter=5343
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=4647
DEBUG: inside 'TSEN_Get_Temp()' - counter=5451
DEBUG: inside 'TSEN_Get_Temp()' - counter=5462
DEBUG: inside 'TSEN_Get_Temp()' - counter=5367
DEBUG: inside 'TSEN_Get_Temp()' - counter=5457
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5325
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5391
DEBUG: inside 'TSEN_Get_Temp()' - counter=5384
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=5167
DEBUG: inside 'TSEN_Get_Temp()' - counter=5328
DEBUG: inside 'TSEN_Get_Temp()' - counter=5466
DEBUG: inside 'TSEN_Get_Temp()' - counter=5457
DEBUG: inside 'TSEN_Get_Temp()' - counter=5337
DEBUG: inside 'TSEN_Get_Temp()' - counter=5461
DEBUG: inside 'TSEN_Get_Temp()' - counter=5329
DEBUG: inside 'TSEN_Get_Temp()' - counter=5341
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5345
DEBUG: inside 'TSEN_Get_Temp()' - counter=5295
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5412
DEBUG: inside 'TSEN_Get_Temp()' - counter=5470
DEBUG: inside 'TSEN_Get_Temp()' - counter=5465
DEBUG: inside 'TSEN_Get_Temp()' - counter=5412
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=3799
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5460
DEBUG: inside 'TSEN_Get_Temp()' - counter=5368
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5378
DEBUG: inside 'TSEN_Get_Temp()' - counter=5349
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5390
DEBUG: inside 'TSEN_Get_Temp()' - counter=5395
DEBUG: inside 'TSEN_Get_Temp()' - counter=1308
DEBUG: inside 'TSEN_Get_Temp()' - counter=5459
DEBUG: inside 'TSEN_Get_Temp()' - counter=5361
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5382
DEBUG: inside 'TSEN_Get_Temp()' - counter=5356
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5316
DEBUG: inside 'TSEN_Get_Temp()' - counter=5431
DEBUG: inside 'TSEN_Get_Temp()' - counter=5450
DEBUG: inside 'TSEN_Get_Temp()' - counter=5387
DEBUG: inside 'TSEN_Get_Temp()' - counter=5302
DEBUG: inside 'TSEN_Get_Temp()' - counter=5464
DEBUG: inside 'TSEN_Get_Temp()' - counter=5467
DEBUG: inside 'TSEN_Get_Temp()' - counter=5391
DEBUG: inside 'TSEN_Get_Temp()' - counter=5454
DEBUG: inside 'TSEN_Get_Temp()' - counter=5340
DEBUG: inside 'TSEN_Get_Temp()' - counter=5367
DEBUG: inside 'TSEN_Get_Temp()' - counter=5145
DEBUG: inside 'TSEN_Get_Temp()' - counter=5355
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5299
DEBUG: inside 'TSEN_Get_Temp()' - counter=5405
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5342
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5423
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5452
DEBUG: inside 'TSEN_Get_Temp()' - counter=5333
DEBUG: inside 'TSEN_Get_Temp()' - counter=5415
DEBUG: inside 'TSEN_Get_Temp()' - counter=5469
DEBUG: inside 'TSEN_Get_Temp()' - counter=5331
DEBUG: inside 'TSEN_Get_Temp()' - counter=5384
DEBUG: inside 'TSEN_Get_Temp()' - counter=2231
DEBUG: inside 'TSEN_Get_Temp()' - counter=5337
DEBUG: inside 'TSEN_Get_Temp()' - counter=5398
DEBUG: inside 'TSEN_Get_Temp()' - counter=5453
DEBUG: inside 'TSEN_Get_Temp()' - counter=5337
DEBUG: inside 'TSEN_Get_Temp()' - counter=5338
DEBUG: inside 'TSEN_Get_Temp()' - counter=5452
DEBUG: inside 'TSEN_Get_Temp()' - counter=5342
DEBUG: inside 'TSEN_Get_Temp()' - counter=5376
DEBUG: inside 'TSEN_Get_Temp()' - counter=5461
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5316
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5359
DEBUG: inside 'TSEN_Get_Temp()' - counter=5313
DEBUG: inside 'TSEN_Get_Temp()' - counter=5463
DEBUG: inside 'TSEN_Get_Temp()' - counter=5390
DEBUG: inside 'TSEN_Get_Temp()' - counter=4938
DEBUG: inside 'TSEN_Get_Temp()' - counter=5471
DEBUG: inside 'TSEN_Get_Temp()' - counter=5363
DEBUG: inside 'TSEN_Get_Temp()' - counter=5367
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5312
DEBUG: inside 'TSEN_Get_Temp()' - counter=5351
DEBUG: inside 'TSEN_Get_Temp()' - counter=5464
DEBUG: inside 'TSEN_Get_Temp()' - counter=5122
DEBUG: inside 'TSEN_Get_Temp()' - counter=5348
DEBUG: inside 'TSEN_Get_Temp()' - counter=5458
DEBUG: inside 'TSEN_Get_Temp()' - counter=5368
DEBUG: inside 'TSEN_Get_Temp()' - counter=5390
DEBUG: inside 'TSEN_Get_Temp()' - counter=5456
DEBUG: inside 'TSEN_Get_Temp()' - counter=5316
DEBUG: inside 'TSEN_Get_Temp()' - counter=5376
DEBUG: inside 'TSEN_Get_Temp()' - counter=5466
DEBUG: inside 'TSEN_Get_Temp()' - counter=5412
DEBUG: inside 'TSEN_Get_Temp()' - counter=5407
DEBUG: inside 'TSEN_Get_Temp()' - counter=5455
DEBUG: inside 'TSEN_Get_Temp()' - counter=5390
DEBUG: inside 'TSEN_Get_Temp()' - counter=5333
DEBUG: inside 'TSEN_Get_Temp()' - counter=5453
DEBUG: inside 'TSEN_Get_Temp()' - counter=4496
DEBUG: inside 'TSEN_Get_Temp()' - counter=5468
DEBUG: inside 'TSEN_Get_Temp()' - counter=5464
DEBUG: inside 'TSEN_Get_Temp()' - counter=5333
DEBUG: inside 'TSEN_Get_Temp()' - counter=5313

@MaxineMuster
Copy link
Contributor

Thanks, that's good news!
If the "regular" reading takes less than 5500 loops, maybe we should quit the loop after 15.000 or 20.000 tries?
I'll make a new version, for now with 100.000 we allow almost 20 times the "estimated" number of loops.

O.k., here we go with a 20.000 loops version:
dev_20240908_135831.zip

@diepeterpan
Copy link

Thanks, that's good news! If the "regular" reading takes less than 5500 loops, maybe we should quit the loop after 15.000 or 20.000 tries? I'll make a new version, for now with 100.000 we allow almost 20 times the "estimated" number of loops.

O.k., here we go with a 20.000 loops version: dev_20240908_135831.zip

Ok, I installed it, anything I must look for, or just report stability?

@MaxineMuster
Copy link
Contributor

Just see if it's stable now (and maybe, if there are any events, where we exited the loop (when temp is set to -999)).

There will be some line in the log:

DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_LOW loop failed ... (or "TSVBE_HIGH")

@restoration2699
Copy link

O.k., here we go with a 20.000 loops version:
dev_20240908_135831.zip

also installed this version. will notify if something wrong.

@diepeterpan
Copy link

Just see if it's stable now (and maybe, if there are any events, where we exited the loop (when temp is set to -999)).

There will be some line in the log:

DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_LOW loop failed ... (or "TSVBE_HIGH")

Ok, got a TSVBE_HIGH at +- 3.5 hours, see serial extract below.

Info:MAIN:Time 11705, idle 0/s, free 95352, MQTT 1(2), bWifi 1, se[TRACE] Exiting OnEverySecond
condsWithNoPing -1, socks 2/21 
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
DEBUG: inside 'TSEN_Get_Temp()' - counter=5500
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MAIN:Time 11706, idle 0/s, free 95352, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 

@MaxineMuster
Copy link
Contributor

Ok, got a TSVBE_HIGH at +- 3.5 hours, see serial extract below.

That's what I expected, we should see this from time to time if we prevent the device from rebooting...
Let's look for stability of the devices, and there's still an error handling missing ( not to display "-999" if reading of the temperature failed ).

@diepeterpan
Copy link

Ok, got a TSVBE_HIGH at +- 3.5 hours, see serial extract below.

That's what I expected, we should see this from time to time if we prevent the device from rebooting... Let's look for stability of the devices, and there's still an error handling missing ( not to display "-999" if reading of the temperature failed ).

So far, there are a couple of HIGH and LOW but no WATCHDOG crashes, it seems stable re: this specific problem of temperature reading.

DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_HIGH loop failed: counter=20002

DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_LOW loop failed: counter=20002
DEBUG: inside 'TSEN_Get_Temp()' - TSVBE_LOW loop failed: counter=20002

@restoration2699
Copy link

untitled
got it on this firmware:

O.k., here we go with a 20.000 loops version:
dev_20240908_135831.zip

@diepeterpan
Copy link

untitled got it on this firmware:

O.k., here we go with a 20.000 loops version:
dev_20240908_135831.zip

Without a serial trace, it would be difficult to say whether it is related to the temperature reading addressed by the firmware/build you refer to. @giedriuslt reported another reboot due to an error outside of the main loop which is not addressed by this, maybe you experienced something similar. As soon as I get a WATCHDOG reset I will report, but so far all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants