Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nRF: Busy wait clock is skewed vs. timer clock #24784

Closed
andyross opened this issue Apr 28, 2020 · 17 comments · Fixed by #25539
Closed

nRF: Busy wait clock is skewed vs. timer clock #24784

andyross opened this issue Apr 28, 2020 · 17 comments · Fixed by #25539
Assignees
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug

Comments

@andyross
Copy link
Contributor

On nRF5 SoCs, the integration uses CONFIG_ARCH_HAS_CUSTOM_BUSY_WAIT=y to forward k_busy_wait() calls to a HAL-defined nrfx_coredep_delay_us() function (which presumably works at CPU cycle precision) instead of using the default Zephyr one (which is limited to the 32kHz cycle rate).

Unfortunately, these clocks don't match. The HAL timer is consistently about .5-.6% too fast, meaning that k_busy_wait() will return earlier than users expect.

This is causing a test failure on at least nrf52840 devices, where tests/kernel/timer/timer_api has a few long delays that it expects to be correct relative to the tick/cycle timer. This got unmasked by the timer API changes (which fixed a double-round-up in the k_timer_remaining_*() paths), but it seems like the problem is definitely at the platform layer.

The code below does a bunch of busy waits at different delays and compares them to the times reported by the existing timer driver. It produces the following log on my nrf52840dk_nrf52840. Note that as the delays increase the error settles down to a consistent asymptote, implying that it's the clock rates that are skewed and not just an offset or aliased conversion somewhere:

(Note also that as I iterate the test, the deltas are jumping around quite a bit to the tune of 5-10 32kHz cycles per half-second busy wait. 300us of jitter seems like way too much to me given the clock rates involved.)

*** Booting Zephyr OS build zephyr-v2.2.0-2044-g315cba1c40fc  ***
                eff   err
   us    cyc     us   ppm
 ===== ====== ====== =====
     1      0      0 1000000
     2      0      0 1000000
     4      0      0 1000000
     8      0      0 1000000
    16      1     30     0
    32      1     30 500000
    64      2     61 333333
   128      5    152     0
   256      8    244 111111
   512     16    488 58823
  1024     33   1007 29411
  2048     67   2044 14705
  4096    134   4089  7407
  8192    267   8148  7434
 16384    535  16326  3724
 32768   1070  32653  3724
 65536   2141  65338  3258
131072   4275 130462  4656
262144   8542 260681  5587
524288  17080 521240  5820
#include <zephyr.h>
#include <stdlib.h>

void main(void)
{
	printk("                eff   err\n");
	printk("   us    cyc     us   ppm\n");
	printk(" ===== ====== ====== =====\n");
	for (int us = 1; us < 1000000; us *= 2) {

		/* Busy Wait */
		int t0 = k_cycle_get_32();
		k_busy_wait(us);
		int t1 = k_cycle_get_32();

		/* Effective us waited: floor, because we want to wait
		 * at least this long and this is the longest duration
		 * we can prove.
		 */
		int eff_us = k_cyc_to_us_floor32(t1 - t0);

		/* Error, in parts per million */
		int exp_cyc = k_us_to_cyc_ceil32(us);
		int err_ppm = (int)(1000000ull
				    * abs((t1 - t0) - exp_cyc) / exp_cyc);

		printk("%6d %6d %6d %5d\n", us, t1-t0, eff_us, err_ppm);
	}
}
@andyross andyross added the bug The issue is a bug, or the PR is fixing a bug label Apr 28, 2020
@andyross
Copy link
Contributor Author

FWIW: if I apply this, then the test seems to run reliably for me and the deltas reported by the code above go down to the level of the measured jitter:

diff --git a/soc/arm/nordic_nrf/nrf52/soc.c b/soc/arm/nordic_nrf/nrf52/soc.c
index 9a59e72d96..4b7e78e4f8 100644
--- a/soc/arm/nordic_nrf/nrf52/soc.c
+++ b/soc/arm/nordic_nrf/nrf52/soc.c
@@ -80,7 +80,7 @@ static int nordicsemi_nrf52_init(struct device *arg)
 
 void arch_busy_wait(u32_t time_us)
 {
-       nrfx_coredep_delay_us(time_us);
+       nrfx_coredep_delay_us(time_us + time_us / 160);
 }
 
 void z_platform_init(void)

@nashif
Copy link
Member

nashif commented Apr 28, 2020

@pabigot fyi

@pabigot pabigot self-assigned this Apr 28, 2020
@pabigot
Copy link
Collaborator

pabigot commented Apr 28, 2020

This isn't news, it's been a potential issue since we had to switch to a custom busywait to get the resolution necessary for microsecond-resolution delays.

IIRC the crystals often used for nRF5 LFCLK (32 KiHz) generally have around 25 ppm accuracy, while the calibrated RC oscillator is intended to be 250 ppm (uncalibrated can be 500 ppm). The HFCLK is supposed to be around 40 ppm.

I suspect the tuning factor is going to be device- and temperature-dependent.

The DS3231 RTC I've been working on has a 2 ppm TCXO with a 1 pps signal that can be used to trigger hardware captures of both the HFCLK and LFCLK values to measure absolute accuracy rather than relative accuracy. I'll work up an application and try it on a variety of Nordic boards to see if there's enough consistency to warrant scaling the times.

@andyross
Copy link
Contributor Author

FWIW, the measured error is about an order of magnitude higher than those numbers on my board (and also presumably a Reel board on our test rack that I didn't directly test with this code). This feels a lot more like a miscalibrated or incorrectly transcribed clock rate somewhere than a bum crystal.

@andyross
Copy link
Contributor Author

And the multi-cycle jitter worries me too. If the HAL was slower, I'd say that this is absolutely an unmasked interrupt firing (this test runs a busy wait with interrupts enabled) and messing up a hand-calibrated CPU loop that didn't know it was interrupted. But the HAL is faster, and the Zephyr timer is an RTC that isn't supposed to be subject to jitter, so I have no good ideas.

@andyross
Copy link
Contributor Author

Spent some more time looking at this, becuase it's fun.

First, the HAL delay loop is choosing the "DWT Based" timer, which is as guessed a tight CPU loop of 3 instructions. (See nrfx_coredep.h:157). That's good, because now we know the clock in use, which can be directly read via SysTick. It's configured as 64 MHz, and I know the HAL assumes that because NRFX_DELAY_CPU_FREQ_MHZ is hard-selected in the HAL headers based on the SoC type.

So I wrote a quick rig that compares the RTC clock used by the timer driver directly against SysTick, running loops for 5 seconds at a time computing total clock advance on both devices and computing a ratio. It does this with interrupts disabled and direct register access, so it's not polluted by anything crazy in Zephyr. Code is below.

Now... if the clock rates are 6400000 and 32768, we'd expect to get exactly 1953.125 SysTick cycles per RTC cycles. But this is what the output looks like:

*** Booting Zephyr OS build zephyr-v2.2.0-2044-g315cba1c40fc  ***
RTC: 163840 SysTick: 321715523 R%: 196359
RTC: 163840 SysTick: 321770078 R%: 196392
RTC: 163840 SysTick: 321591198 R%: 196283
RTC: 163840 SysTick: 321458798 R%: 196202
RTC: 163840 SysTick: 321525718 R%: 196243
RTC: 163840 SysTick: 321535038 R%: 196249
RTC: 163840 SysTick: 321648078 R%: 196318
RTC: 163840 SysTick: 321741478 R%: 196375
RTC: 163840 SysTick: 321819798 R%: 196423
...

That ratio of about 1964 is just plain wrong. One or both of these clocks is lying about its frequency.

Also, one of these clocks looks pretty badly unstable. The ratio is jumping around between 1962 and 1965, it seems like, with occasional excursions out beyond 1967. What I thought earlier was jitter seems more likely to be an unstable PLL?

I don't have access to a high quality external time source to figure out which one is wrong, nor the time to rig up the I/O required to measure it. But at least for now, and assuming this is consistent between boards, I'd suggest adding a scaling factor to arch_busy_wait() to fix the known skew. It's definitely causing test failures.

#include <zephyr.h>
#include <hal/nrf_rtc.h>
#include <arch/arm/aarch32/cortex_m/cmsis.h>

void clock_check(void)
{
	/* Spin, checking both clocks as close to simultaneously as
	 * possible.  We need to do this in a tight loop without
	 * logging because the SysTick clock wraps faster than you
	 * think
	 */
	u64_t rtc_tot = 0, systick_tot = 0;
	u32_t systick_last = SysTick->VAL;
	u32_t rtc_last = nrf_rtc_counter_get(NRF_RTC1);

	while(rtc_tot / 32768 < 5) {
		u32_t systick_now = SysTick->VAL;
		u32_t rtc_now = nrf_rtc_counter_get(NRF_RTC1);

		/* Update totals, note that SysTick counts DOWN */
		systick_tot += (systick_last - systick_now) & 0xffffff;
		rtc_tot += rtc_now - rtc_last;

		systick_last = systick_now;
		rtc_last = rtc_now;
	}

	/* Dump the total cycle counts for both clocks, and a ratio in
	 * percent units. */
	printk("RTC: %d SysTick: %d R%%: %d\n",
	       (int)rtc_tot, (int)systick_tot,
	       (int)(100 * systick_tot / rtc_tot));
}

void main(void)
{
	/* Mask interrupts so nothing interferes */
	arch_irq_lock();

	/* Initialize SysTick so it counts through the full 24 bit
	 * range
	 */
	SysTick->LOAD = 0xffffff;
	SysTick->CTRL |= (SysTick_CTRL_ENABLE_Msk |
			  SysTick_CTRL_CLKSOURCE_Msk);

	while (true) {
		clock_check();
	}
}

@pabigot
Copy link
Collaborator

pabigot commented Apr 29, 2020

Easily explained. For reasons too subtle for me to understand the HFCLK on Nordic is left at its power-up source HFINT, which is an internal oscillator with a typical tolerance of +/- 1.5% but maximum of +/- 8%.

If you apply the following patch to your example things should stabilize nicely:

diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index 05fbccc964..95d9a08e27 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -1,6 +1,8 @@
 #include <zephyr.h>
 #include <hal/nrf_rtc.h>
 #include <arch/arm/aarch32/cortex_m/cmsis.h>
+#include <drivers/clock_control.h>
+#include <drivers/clock_control/nrf_clock_control.h>
 
 void clock_check(void)
 {
@@ -44,6 +46,10 @@ void main(void)
 	SysTick->CTRL |= (SysTick_CTRL_ENABLE_Msk |
 			  SysTick_CTRL_CLKSOURCE_Msk);
 
+	struct device *clock = device_get_binding(DT_LABEL(DT_NODELABEL(clock)));
+	int rc = clock_control_on(clock, CLOCK_CONTROL_NRF_SUBSYS_HF);
+	printk("Clock on %d\n", rc);
+
 	while (true) {
 		clock_check();
 	}

I get:

*** Booting Zephyr OS build zephyr-v2.2.0-1932-g38669ba62e46  ***
Clock on 0
RTC: 163840 SysTick: 320089322 R%: 195367
RTC: 163840 SysTick: 319990156 R%: 195306
RTC: 163840 SysTick: 319990636 R%: 195306
RTC: 163840 SysTick: 319990036 R%: 195306
RTC: 163840 SysTick: 319990516 R%: 195306
RTC: 163840 SysTick: 319990396 R%: 195306
RTC: 163840 SysTick: 319990356 R%: 195306
RTC: 163840 SysTick: 319989596 R%: 195306
RTC: 163840 SysTick: 319990636 R%: 195306
RTC: 163840 SysTick: 319990516 R%: 195306
RTC: 163840 SysTick: 319990436 R%: 195306
RTC: 163840 SysTick: 319989796 R%: 195306
RTC: 163840 SysTick: 319990236 R%: 195306
RTC: 163840 SysTick: 319989636 R%: 195306
RTC: 163840 SysTick: 319990596 R%: 195306
RTC: 163840 SysTick: 319990516 R%: 195306
RTC: 163840 SysTick: 319990436 R%: 195306
RTC: 163840 SysTick: 319990316 R%: 195306

We really need:

@andyross
Copy link
Contributor Author

So the busy wait clock is a silicon oscillator and not a crystal at all? Yeah, that doesn't seem kosher, though maybe it's a configuration that should be supported via configuration. From a Zephyr perspective, the requirement is that k_busy_wait() and the timer driver need to agree on time, and this configuration just can't do that.

From a board perspective, while I can see the value in having the built-in oscillator as a fallback (surely there are devices in the world that skip the crystal), every board with a crystal surely need to boot using that by default, no?

@nordic-krch
Copy link
Contributor

Busy wait is using high frequency clock and source of HF clock can be internal oscillator (low precision but also less power, on nrf52840dk it's 230uA less) or HFXO (some peripherals like RADIO requires it). LF clock (used in general for system clock) can use internal RC if XTAL not present on the board (less accurate, requires periodic calibration) or external LFXO (precise). Turning on HFXO takes time so it cannot be applied to busy waiting. If it is for the test to pass then HFXO can be started for the test. If we consider this an issue in the application then we can think about using combination - coarse delay on system clock and remainder using NOP'ing (cpu clock).

@pabigot
Copy link
Collaborator

pabigot commented Apr 29, 2020

From a Zephyr perspective, the requirement is that k_busy_wait() and the timer driver need to agree on time, and this configuration just can't do that.

Nordic hardware can't do that and maintain both a reasonable power consumption and time accuracy. The preferred system clock (accurate and low power) is limited to 32 KiHz, and a higher priority for applications is that k_busy_wait() not be limited to multiples of 30.6 us. The majority of use cases for k_busy_wait() outside tests do not involve times in excess of 1 ms.

I do think we need to clarify the expectations on boot stability of important clocks (#22758 handles half of that), defaulting to "boot when stable" and easy to override (or reconfigure) to "unstable" for applications more interested in power consumption than time accuracy.

@nordic-krch
Copy link
Contributor

See #24817 proposal with mixed busy waiting. Please let me know what do you think about it.

@andyross
Copy link
Contributor Author

Again, isn't the easier solution to make the crystal the default and leave the free-running silicon oscillator available as an optional configuration (at the expense of precision, with warnings, etc...). The development boards that are failing in testing are all powered via USB anyway.

Are you guys not seeing the failure in timer_api in your rig?

@carlescufi carlescufi added the area: Tests Issues related to a particular existing or missing test label Apr 30, 2020
@nordic-krch
Copy link
Contributor

nordic-krch commented May 5, 2020

Lets bring the discussion back to the issue. It is now happening in PRs.

Let me summarize how i see the problem:

  • cpu clock or HF clock can have 2 sources:
    • RC (default) - inaccurate (up to 8%)
    • XTAL - accurate, draws more power
  • system clock also referred as LF clock also can have 2 sources:
    • RC - inaccurate
    • XTAL accurate, always used if present on the board but not all boards/products has it.

Due to system clock frequency, k_busy_wait() is using HF clock. There are 2 potential issues with k_busy_wait:

  • it is not aligned with system clock if at least one clock is based on RC. Not aligned means that k_busy_wait(n) will give different results than k_msleep(1000*n), error accumulates so it becomes significant when n is high (milliseconds range)
  • it is not accurate if RC is used, since RC accuracy is +/-8%

Alignment can be achieved in 2 configurations:

  • HF XTAL and LF XTAL are used
    • requires LF XTAL which is not always present
    • requires HF XTAL which is normally not required and adds unnecessary ~230uA, power consumption degrades
  • LF SYNTH is used - HF clock always on
    • not low power thus not suitable for products (nordic targets battery powered applications)

k_busy_wait accuracy:

  • HF XTAL is on:
    • requires HF XTAL which is normally not required and adds unnecessary ~230uA, power

Solutions:

@nordic-krch
Copy link
Contributor

Another thing is that we have one real problem which is clock alignement in tests where misalignment leads to test failures and some theoretical not reported (yet?).

@carlescufi
Copy link
Member

@andyross we need to address this in the context of 2.3. Could you please read through the comments and provide some feedback?

@andyross
Copy link
Contributor Author

If it were my problem, I'd just solve it with default management. A busy wait implementation that completes ~1% faster than the timer clock isn't a disaster, and virtually all production code that is using a busy wait (for device synchronization or whatever) doesn't care -- the wait periods are far shorter than one tick so the delta would never be measured.

My suggestion above was to make the CPU clock (which is the base for busy wait timing) the crystal by default and provide a kconfig like NRF_UNSTABLE_CPU_CLOCK for applications that understood the tradeoff and desired low power. But I'd wouldn't cry if that were the default and we had a NRF_STABLE_CPU_CLOCK tunable for tests like this which do very long busy waits.

The mixed mode trick from @nordic-krch would surely work too, and is really clever, but it strikes me as needlessly complicated given how rare this kind of requirement will be in real applications.

@carlescufi carlescufi added priority: high High impact/importance bug and removed priority: medium Medium impact/importance bug labels May 21, 2020
@carlescufi
Copy link
Member

carlescufi commented May 21, 2020

@andyross turning the crystal on by default would increase the power consumption by 230uA on all applications running on Nordic hardware. Furthermore, that approach will not solve the problem ayway for boards that use an RC low frequency clock (i.e. no 32KHz crystal present on the board).
The proposal that we discussed today is not to assume that there's a coupling between the sources and accuracy of k_busy_wait() and timer. This means that somewhere there needs to be a frequency tolerance parameter somewhere for the clocks.

In Nordic, the system timer (aka LF clock) can be driven by:

  • External 32KHz crystal
  • RC oscillator
  • Synthesized from the HF clock (which in turn can be driven by multiple sources)

and the k_busy_wait() timer (aka HF clock) can be driven by:

  • External 32MHz crystal
  • On-chip 64MHZ RC oscillator

Depending on the combination chose, the tolerance of each the LF and HF clocks will be different, and so it is non-trivial to select one to apply to comparisons. Nordic hardware needs to allow for these different combinations in order to provide with competitive power consumption numbers and flexible configuration options for PCBs.

From our perspective, the correct way of solving this would be to have a max tolerance (since actual tolerance is actually depending on temperature) for each of the sources above, and then combine the tolerances to create a multiplier that you apply to the comparison value in the test.
Since this is likely non-trivial to implement and impractical (because the max combined tolerance is going to end up being a number as high as ~13% hence masking anything else), an alternative, simpler solution would be to have a single "clock tolerance" constant per-board or per-SoC that tests use to multiply the comparison value by, or any other rational, simple modification of the test that enables it to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
5 participants