-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to trace where watchdog reboots on bl602 #1322
base: main
Are you sure you want to change the base?
Conversation
Thank you, I am running it and will give feedback as soon as I catch one. e.g. example of [TRACE] seen in log
|
I captured numerous, see below.
|
Okay, so reading temperature causes this. There is apparently a possibility to wait indefinitely during temp read here https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c#L1199 . That causes the issue. I guess we should not read temperature of bl602 by default. |
Couldn't we just limit the number of tries for the loop and skip else? Like (not tested, just wrote down and the number should be adjusted)
You probably saw it: this "while" loop fragment is called twice, for low and high... |
Could be done, but that requires modifying sdk. For now I disabled temp reading, lets see if it helps. |
Installed the build, can see 0.0 temp, so def. not reading the temp anymore, will report back on stability and any reboots. |
12 hours and counting, and thus far no reboot problems. Observation: The device feels much more responsive, the temp. reading code in the SDK is prob. CPU intensive and it is called frequently. Proposal: Copy the SDK code to your code and don't change the SDK, thus creating a similar, more efficient function without the endless loop. And secondly, don't update the temp. that frequently, say once every 10 sec should be efficient. |
There is another implementation of a "TSEN_Get_Temp" (with different arguments) here: https://github.com/bouffalolab/bouffalo_sdk/blob/master/drivers/lhal/src/bflb_adc.c#L744 Its waiting up to 100 ms for temperature data:
Note: e.g. inside "void bflb_update_adc_trim(struct bflb_device_s *dev, const struct bflb_adc_config_s *config" there is also an unlimited loop |
I tried to make a version with the changes below: (since I don't have a BL602, I can't test and there is a slight danger, this version won't work at all) In SDK
re-enabled temperature reads every five seconds
|
@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where? I am considering loading it onto my 'bench' BL602 which has serial logging and wires soldered to flash incase it fails, @giedriuslt what do you think? I have also installed @giedriuslt trace version on a live BL602 bulb currently at 18 hours uptime. PS. Anyone else tracing and testing? |
Yes, my test-version is based on this PR, so including all the "[TRACE]..." messages [EDIT]: |
Ok, installed @MaxineMuster version on my 'bench' BL602. Let's see how it goes. @giedriuslt Version still running on live bulb with 21 hours uptime. EDIT: 24 hours and counting on @giedriuslt build without temperature measurement. |
I can test dev firmware, but I can't get log by serial from device. |
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting |
I'm curious: Is temperature reading working? Are there some -999° readings (in this simple test they are not disgarded as invalid)? |
in web gui reading works. and it looks like reading correct value. |
Have not yet seen a -999, otherwise it's working fine. |
BENCH BL602 with serial logging using @MaxineMuster version: 34 hours no BL_RST and counting |
BENCH BL602 with serial logging using @MaxineMuster version: 2.5 days no BL_RST and counting Seems like this fix the BL602 stability, at least for my devices (Light bulbs). |
on version 655, I got a reboot on day 5. So it's too early to draw conclusions. |
Sorry to ask, but, did you try the "release" version or the firmware offered in this PR? |
yep. |
O.k., the fix tested here is not yet in release version, but only in the versions offered here. |
for now I'm running your version posted here. I mean on 655 release version first watchdog reboot took a lot of time. about 6 days. |
Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.
|
temp. reading works on release versions, but as I understood it may cause watchdog reboot. |
BENCH BL602 with serial logging using @MaxineMuster version: 4 days no BL_RST and counting It's the most stable these light bulbs have been on OpenBeken firmware. But let's see how far I can push my luck, def. an improvement so far. |
What is the reason for the reboot? Watchdog? I'm curious if there's some other adc call been made that uses one of those funky endless loops. |
You might be on to something here. WiFi driver is potentially reading temp for calibration purposes the same way. https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_wifidrv/bl60x_wifi_driver/wifi_mgmr.c#L997 |
Well, no, this temperature calibration is disabled by default. So something else caused my reboot |
8.5 days online on this firmware
|
BENCH BL602 with serial logging using @MaxineMuster version: 9.5 days no BL_RST_SOFTWARE_WATCHDOG and counting |
Great, I think we should merge some version of this fix. |
Got another reboot, again outside of main loop. Other socket has 14 days uptime now 🤔 |
I agree that some version of this fix can be merged, on my LIVE BL602 light bulb without serial logging using @giedriuslt version without temperature reporting I now have 16 days and no BL_RST_SOFTWARE_WATCHDOG and counting. I am considering installing @MaxineMuster version on the live BL602 light bulb and see if I also get 16+ days. |
16.5 days online on @MaxineMuster firmware |
Maybe a bit late, but I would like to offer another version. This version will just count the number of loops checking the FIFO, stopping after 1000. The (not relevant) changes: in user_main I replaced The relevant change for BL602 in sdk:
|
Got feedback from @diepeterpan (thanks for giving it a try), that getting temperature fails.
|
This was very confusing, I struggled to get dev_20240907_135546.zip installed. Are we sure dev_20240907_135546.zip is correct as the build date time once installed identify differently and it only shows -999.9 for the temp.
7 Sept file contains a date for 21 Aug, so I did install the correct version. Anyway, let me know if you need me to test anything else. |
I just did a minor change to "user_main.c" (deleted an unused blank). Maybe this will give the correct date, for the last versions didn't change OpenBeken code but just SDK code?
EDIT: If I got it right, this date refers not to the actual date, where the image was built, but to the GIT macros OpenBK7231T_App/src/httpserver/new_http.c Line 62 in 9becf09
|
OK, I installed the latest version and see the temp, we just looking for stability or anything specific from the serial log? |
Thanks, yes, I'm curious about number of times the loop was called. |
Some output containing DEBUG, seems to be on avg. about 5300 times.
|
Thanks, that's good news! O.k., here we go with a 20.000 loops version: |
Ok, I installed it, anything I must look for, or just report stability? |
Just see if it's stable now (and maybe, if there are any events, where we exited the loop (when temp is set to -999)). There will be some line in the log:
|
also installed this version. will notify if something wrong. |
Ok, got a TSVBE_HIGH at +- 3.5 hours, see serial extract below.
|
That's what I expected, we should see this from time to time if we prevent the device from rebooting... |
So far, there are a couple of HIGH and LOW but no WATCHDOG crashes, it seems stable re: this specific problem of temperature reading.
|
|
Without a serial trace, it would be difficult to say whether it is related to the temperature reading addressed by the firmware/build you refer to. @giedriuslt reported another reboot due to an error outside of the main loop which is not addressed by this, maybe you experienced something similar. As soon as I get a WATCHDOG reset I will report, but so far all good. |
Traces all the main loop to serial console to try to catch why bl602 reboots