Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/nrf5x_common: fix ztimer issue on warm-boot #20665

Merged
merged 1 commit into from
May 13, 2024

Conversation

mguetschow
Copy link
Contributor

Contribution description

We have been experiencing problems when using ztimer on the feather-nrf52840-sense boards under certain compiler versions (see inetrg/exercises#5 (comment) and following) after warm-boot (directly after flashing or after pressing the reset button on the board), while after a cold-boot (un- and re-plugging) the board would behave normally.

I've spent quite some time debugging this issue and finally found the culprit: the NRF_CLOCK->LFCLKSTAT is apparently latched for some short time after a softreset (see this comment by Nordic, sadly it is not reported as an erratum) and wrongly reports the low-frequency clock to be running just after a reboot. In clock_start_lf() of cpu/nrf5x_common/clock.c, starting the clock is skipped iff the register reports the clock to be running. Apparently different compiler versions (reproducibly) lead to slightly different startup timings, which would explain why the issue only happens with some toolchain versions.

The linked discussion suggests to add a check for the reset cause and ignoring the (wrong) state of the LFCLK after a soft reset. To avoid the interdependence with the RESETREAS register (which might need to be read and cleared in a future version of RIOT or be expected to be untouched by some application code), I propose a different approach with this PR: Adding a static variable that remembers whether the LFCLK has been started (by RIOT) or not. If anyone has a better suggestion for a work-around, please shout out :)

The above linked discussion is about nRF52832 (locally confirmed with an nrf52dk), but I can confirm this issue exists on nRF52840-based boards (I have tested feather-nrf52840-sense, feather-nrf52840-express, nrf52840dk and nrf52840dongle) as well. I have also tested calliope-mini (with nRF51) where I didn't see the problem, but the suggested workaround should not harm unless the user application code tinkers with LFCLK itself.

While the issue exists on all tested nRF52840-based boards, it has become especially apparent on feather-nrf52840-sense since the reset button triggers the bootloader, which itself triggers a soft-reset into the application. But with the following diff, it is reproducible on the other nRF52840-boards as well.

Testing procedure

Apply the following diff to current master:

diff --git a/examples/hello-world/Makefile b/examples/hello-world/Makefile
index 258d8e9baf..f1164cd53d 100644
--- a/examples/hello-world/Makefile
+++ b/examples/hello-world/Makefile
@@ -7,6 +7,9 @@ BOARD ?= native
 # This has to be the absolute path to the RIOT base directory:
 RIOTBASE ?= $(CURDIR)/../..
 
+USEMODULE += ztimer
+USEMODULE += ztimer_msec
+
 # Comment this out to disable code in RIOT that does safety checking
 # which is not needed in a production environment but helps in the
 # development process:
diff --git a/examples/hello-world/main.c b/examples/hello-world/main.c
index 213128ac64..5b0ca9072a 100644
--- a/examples/hello-world/main.c
+++ b/examples/hello-world/main.c
@@ -20,13 +20,31 @@
  */
 
 #include <stdio.h>
+#include "ztimer.h"
+#include "busy_wait.h"
 
 int main(void)
 {
+    /* busy wait 5s to allow for UART to connect */
+    busy_wait_us(5000000);
+
     puts("Hello World!");
 
+    /* test ztimer time difference using busy loop for 10ms */
+    ztimer_acquire(ZTIMER_MSEC);
+    ztimer_now_t start = ztimer_now(ZTIMER_MSEC);
+    busy_wait_us(10000);
+    printf("difference (exp. 10): %ld\n", ztimer_now(ZTIMER_MSEC) - start);
+    ztimer_release(ZTIMER_MSEC);
+
+    /* sleep for 1s - never returns iff time difference was 0 (ztimer not running) */
+    ztimer_sleep(ZTIMER_MSEC, 1000);
+
     printf("You are running RIOT on a(n) %s board.\n", RIOT_BOARD);
     printf("This board features a(n) %s CPU.\n", RIOT_CPU);
 
+    /* trigger soft-reset */
+    NVIC_SystemReset();
+
     return 0;
 }

Compile with an affected compiler version (debian stable arm-gcc-none-eabi 12.2.1 is known to have the problem) and flash to some nRF52840 or nRF52832 based board: After the first soft-reset (either triggered from the bootloader in case of the Adafruit boards or triggered by the application code), the reported ztimer difference will be zero and the application will hang during ztimer_sleep as time is not advancing.

With the proposed workaround added, the code should behave as expected.

Issues/PRs references

inetrg/exercises#5

@mguetschow mguetschow requested a review from aabadie as a code owner May 13, 2024 12:06
@github-actions github-actions bot added Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: cpu Area: CPU/MCU ports labels May 13, 2024
@mguetschow mguetschow added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label May 13, 2024
@mguetschow mguetschow force-pushed the nrf-softreset-clock-static branch from 325d202 to a49543b Compare May 13, 2024 12:09
cpu/nrf5x_common/clock.c Outdated Show resolved Hide resolved
void clock_start_lf(void)
{
/* abort if LF clock is already running */
if (NRF_CLOCK->LFCLKSTAT & CLOCK_LFCLKSTAT_STATE_Msk) {
if (clock_lf_running) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where this function would be called more than once anyway?
I think we could easily drop that check.

It's only called in rtt_init() which is run once, the call in nimble_riot.c could be guarded with a if (!IS_USED(MODULE_PERIPH_RTT))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this check to a guard in nimble_riot.c moves logic to somewhere where it doesn't belong to. And the product specification explicitly mentions that changing clock parameters while the clock is running is not allowed.

@riot-ci
Copy link

riot-ci commented May 13, 2024

Murdock results

✔️ PASSED

350399d cpu/nrf5x_common: do not rely on latched register for LFCLK state

Success Failures Total Runtime
10082 0 10083 12m:21s

Artifacts

@mguetschow mguetschow force-pushed the nrf-softreset-clock-static branch from 174bb1e to 350399d Compare May 13, 2024 12:35
@benpicco benpicco added this pull request to the merge queue May 13, 2024
Merged via the queue into RIOT-OS:master with commit bea466f May 13, 2024
25 checks passed
@mguetschow mguetschow deleted the nrf-softreset-clock-static branch May 14, 2024 14:47
@mguetschow
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants