-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nRF: Busy wait clock is skewed vs. timer clock #24784
Comments
FWIW: if I apply this, then the test seems to run reliably for me and the deltas reported by the code above go down to the level of the measured jitter: diff --git a/soc/arm/nordic_nrf/nrf52/soc.c b/soc/arm/nordic_nrf/nrf52/soc.c
index 9a59e72d96..4b7e78e4f8 100644
--- a/soc/arm/nordic_nrf/nrf52/soc.c
+++ b/soc/arm/nordic_nrf/nrf52/soc.c
@@ -80,7 +80,7 @@ static int nordicsemi_nrf52_init(struct device *arg)
void arch_busy_wait(u32_t time_us)
{
- nrfx_coredep_delay_us(time_us);
+ nrfx_coredep_delay_us(time_us + time_us / 160);
}
void z_platform_init(void) |
@pabigot fyi |
This isn't news, it's been a potential issue since we had to switch to a custom busywait to get the resolution necessary for microsecond-resolution delays. IIRC the crystals often used for nRF5 LFCLK (32 KiHz) generally have around 25 ppm accuracy, while the calibrated RC oscillator is intended to be 250 ppm (uncalibrated can be 500 ppm). The HFCLK is supposed to be around 40 ppm. I suspect the tuning factor is going to be device- and temperature-dependent. The DS3231 RTC I've been working on has a 2 ppm TCXO with a 1 pps signal that can be used to trigger hardware captures of both the HFCLK and LFCLK values to measure absolute accuracy rather than relative accuracy. I'll work up an application and try it on a variety of Nordic boards to see if there's enough consistency to warrant scaling the times. |
FWIW, the measured error is about an order of magnitude higher than those numbers on my board (and also presumably a Reel board on our test rack that I didn't directly test with this code). This feels a lot more like a miscalibrated or incorrectly transcribed clock rate somewhere than a bum crystal. |
And the multi-cycle jitter worries me too. If the HAL was slower, I'd say that this is absolutely an unmasked interrupt firing (this test runs a busy wait with interrupts enabled) and messing up a hand-calibrated CPU loop that didn't know it was interrupted. But the HAL is faster, and the Zephyr timer is an RTC that isn't supposed to be subject to jitter, so I have no good ideas. |
Spent some more time looking at this, becuase it's fun. First, the HAL delay loop is choosing the "DWT Based" timer, which is as guessed a tight CPU loop of 3 instructions. (See nrfx_coredep.h:157). That's good, because now we know the clock in use, which can be directly read via SysTick. It's configured as 64 MHz, and I know the HAL assumes that because NRFX_DELAY_CPU_FREQ_MHZ is hard-selected in the HAL headers based on the SoC type. So I wrote a quick rig that compares the RTC clock used by the timer driver directly against SysTick, running loops for 5 seconds at a time computing total clock advance on both devices and computing a ratio. It does this with interrupts disabled and direct register access, so it's not polluted by anything crazy in Zephyr. Code is below. Now... if the clock rates are 6400000 and 32768, we'd expect to get exactly 1953.125 SysTick cycles per RTC cycles. But this is what the output looks like:
That ratio of about 1964 is just plain wrong. One or both of these clocks is lying about its frequency. Also, one of these clocks looks pretty badly unstable. The ratio is jumping around between 1962 and 1965, it seems like, with occasional excursions out beyond 1967. What I thought earlier was jitter seems more likely to be an unstable PLL? I don't have access to a high quality external time source to figure out which one is wrong, nor the time to rig up the I/O required to measure it. But at least for now, and assuming this is consistent between boards, I'd suggest adding a scaling factor to arch_busy_wait() to fix the known skew. It's definitely causing test failures. #include <zephyr.h>
#include <hal/nrf_rtc.h>
#include <arch/arm/aarch32/cortex_m/cmsis.h>
void clock_check(void)
{
/* Spin, checking both clocks as close to simultaneously as
* possible. We need to do this in a tight loop without
* logging because the SysTick clock wraps faster than you
* think
*/
u64_t rtc_tot = 0, systick_tot = 0;
u32_t systick_last = SysTick->VAL;
u32_t rtc_last = nrf_rtc_counter_get(NRF_RTC1);
while(rtc_tot / 32768 < 5) {
u32_t systick_now = SysTick->VAL;
u32_t rtc_now = nrf_rtc_counter_get(NRF_RTC1);
/* Update totals, note that SysTick counts DOWN */
systick_tot += (systick_last - systick_now) & 0xffffff;
rtc_tot += rtc_now - rtc_last;
systick_last = systick_now;
rtc_last = rtc_now;
}
/* Dump the total cycle counts for both clocks, and a ratio in
* percent units. */
printk("RTC: %d SysTick: %d R%%: %d\n",
(int)rtc_tot, (int)systick_tot,
(int)(100 * systick_tot / rtc_tot));
}
void main(void)
{
/* Mask interrupts so nothing interferes */
arch_irq_lock();
/* Initialize SysTick so it counts through the full 24 bit
* range
*/
SysTick->LOAD = 0xffffff;
SysTick->CTRL |= (SysTick_CTRL_ENABLE_Msk |
SysTick_CTRL_CLKSOURCE_Msk);
while (true) {
clock_check();
}
} |
Easily explained. For reasons too subtle for me to understand the HFCLK on Nordic is left at its power-up source HFINT, which is an internal oscillator with a typical tolerance of +/- 1.5% but maximum of +/- 8%. If you apply the following patch to your example things should stabilize nicely:
I get:
We really need:
|
So the busy wait clock is a silicon oscillator and not a crystal at all? Yeah, that doesn't seem kosher, though maybe it's a configuration that should be supported via configuration. From a Zephyr perspective, the requirement is that k_busy_wait() and the timer driver need to agree on time, and this configuration just can't do that. From a board perspective, while I can see the value in having the built-in oscillator as a fallback (surely there are devices in the world that skip the crystal), every board with a crystal surely need to boot using that by default, no? |
Busy wait is using high frequency clock and source of HF clock can be internal oscillator (low precision but also less power, on nrf52840dk it's 230uA less) or HFXO (some peripherals like RADIO requires it). LF clock (used in general for system clock) can use internal RC if XTAL not present on the board (less accurate, requires periodic calibration) or external LFXO (precise). Turning on HFXO takes time so it cannot be applied to busy waiting. If it is for the test to pass then HFXO can be started for the test. If we consider this an issue in the application then we can think about using combination - coarse delay on system clock and remainder using NOP'ing (cpu clock). |
Nordic hardware can't do that and maintain both a reasonable power consumption and time accuracy. The preferred system clock (accurate and low power) is limited to 32 KiHz, and a higher priority for applications is that I do think we need to clarify the expectations on boot stability of important clocks (#22758 handles half of that), defaulting to "boot when stable" and easy to override (or reconfigure) to "unstable" for applications more interested in power consumption than time accuracy. |
See #24817 proposal with mixed busy waiting. Please let me know what do you think about it. |
Again, isn't the easier solution to make the crystal the default and leave the free-running silicon oscillator available as an optional configuration (at the expense of precision, with warnings, etc...). The development boards that are failing in testing are all powered via USB anyway. Are you guys not seeing the failure in timer_api in your rig? |
Lets bring the discussion back to the issue. It is now happening in PRs. Let me summarize how i see the problem:
Due to system clock frequency, k_busy_wait() is using HF clock. There are 2 potential issues with k_busy_wait:
Alignment can be achieved in 2 configurations:
k_busy_wait accuracy:
Solutions:
|
Another thing is that we have one real problem which is clock alignement in tests where misalignment leads to test failures and some theoretical not reported (yet?). |
@andyross we need to address this in the context of 2.3. Could you please read through the comments and provide some feedback? |
If it were my problem, I'd just solve it with default management. A busy wait implementation that completes ~1% faster than the timer clock isn't a disaster, and virtually all production code that is using a busy wait (for device synchronization or whatever) doesn't care -- the wait periods are far shorter than one tick so the delta would never be measured. My suggestion above was to make the CPU clock (which is the base for busy wait timing) the crystal by default and provide a kconfig like The mixed mode trick from @nordic-krch would surely work too, and is really clever, but it strikes me as needlessly complicated given how rare this kind of requirement will be in real applications. |
@andyross turning the crystal on by default would increase the power consumption by 230uA on all applications running on Nordic hardware. Furthermore, that approach will not solve the problem ayway for boards that use an RC low frequency clock (i.e. no 32KHz crystal present on the board). In Nordic, the system timer (aka LF clock) can be driven by:
and the
Depending on the combination chose, the tolerance of each the LF and HF clocks will be different, and so it is non-trivial to select one to apply to comparisons. Nordic hardware needs to allow for these different combinations in order to provide with competitive power consumption numbers and flexible configuration options for PCBs. From our perspective, the correct way of solving this would be to have a max tolerance (since actual tolerance is actually depending on temperature) for each of the sources above, and then combine the tolerances to create a multiplier that you apply to the comparison value in the test. |
On nRF5 SoCs, the integration uses CONFIG_ARCH_HAS_CUSTOM_BUSY_WAIT=y to forward k_busy_wait() calls to a HAL-defined nrfx_coredep_delay_us() function (which presumably works at CPU cycle precision) instead of using the default Zephyr one (which is limited to the 32kHz cycle rate).
Unfortunately, these clocks don't match. The HAL timer is consistently about .5-.6% too fast, meaning that k_busy_wait() will return earlier than users expect.
This is causing a test failure on at least nrf52840 devices, where tests/kernel/timer/timer_api has a few long delays that it expects to be correct relative to the tick/cycle timer. This got unmasked by the timer API changes (which fixed a double-round-up in the k_timer_remaining_*() paths), but it seems like the problem is definitely at the platform layer.
The code below does a bunch of busy waits at different delays and compares them to the times reported by the existing timer driver. It produces the following log on my nrf52840dk_nrf52840. Note that as the delays increase the error settles down to a consistent asymptote, implying that it's the clock rates that are skewed and not just an offset or aliased conversion somewhere:
(Note also that as I iterate the test, the deltas are jumping around quite a bit to the tune of 5-10 32kHz cycles per half-second busy wait. 300us of jitter seems like way too much to me given the clock rates involved.)
The text was updated successfully, but these errors were encountered: