kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539

pabigot · 2020-05-21T22:53:42Z

A couple failures observed in #24001 are a consequence of precision loss converting between milliseconds and 32 KiHz ticks, compounded by clock rate variation. Other failures are due to skew between the clock used for busywait and the clock used for system time. Also asynchronous updates to the tick counter can occur before or after the value it's compared with has been captured.

In short, tweak the test until it runs reliably on problematic hardware identified in #24001 (where "reliably" means >100 consecutive runs without failure using CONFIG_ZTEST_RETEST_IF_PASSED=y).

Fixes #24001
Fixes #24784 through a documentation fix clarifying that k_busy_wait() delays are not necessarily aligned to sysclock.

zephyrbot · 2020-05-21T22:56:23Z

All checks passed.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

pabigot · 2020-05-22T09:38:52Z

This passed 2160 consecutive runs before failing with:

    Assertion failed at ../src/main.c:571: test_timer_remaining: abs((s32_t)(rem_ticks - target_rem_tic
ks)) <= 3 is false

on a board that had a fairly accurate (< 1000 ppm error) HFINT (64 MHz internal oscillator).

On another board with -7200 ppm HFINT error and +61 ppm LFXO error the tick error on that test is -13 (expected tick duration is 1640) consistently. I don't see any point in increasing the threshold higher: some boards will simply be out of tolerance.

andyross

Some nitpicks. The only issue I feel strongly about is that we can't change documented behavior via a tweak to a test. If the clocks are now allowed to be skewed by policy now, then we should document that in the k_busy_wait() definition.

andyross · 2020-05-22T15:32:00Z

tests/kernel/timer/timer_api/src/main.c

@@ -18,6 +19,7 @@ struct timer_data {
 #define EXPIRE_TIMES 4
 #define WITHIN_ERROR(var, target, epsilon)       \
 		(((var) >= (target)) && ((var) <= (target) + (epsilon)))
+#define INEXACT_MS_CONVERT (k_ticks_to_ms_ceil64(k_ms_to_ticks_ceil64(1)) != 1)


This might be a little more robust if you checked that they weren't divisible directly. Something like:

#define INEXACT_MS_CONVERT \ (MAX(CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, \ CONFIG_SYS_CLOCK_TICKS_PER_SEC) % \ MIN(CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, \ CONFIG_SYS_CLOCK_TICKS_PER_SEC) == 0) \

Honestly I had to think really hard to figure out if that double conversion was guaranteed to detect the case you want, and I'm still not completely convinced it does for all non-divisible rates.

I don't think that does at all the same thing; cycles per second isn't relevant.

We need to know that for 1 ms we get an integral number of ticks. If we get an integral number of ticks, then when we convert that factor back to ms it'll be the same value we started with.

For the original approach think of it as: the ceil operation will increase the value only if there's a remainder for the corresponding scaling. The only way you put in 1, convert, reverse convert, and get 1 back is if neither conversion produced a remainder.

But the original condition was stronger than it needed to be, at least for CYCLES_PER_SEC >= 1000, so the required condition can be expressed more simply which I've done.

I've also tried to explain why this is needed.

andyross · 2020-05-22T15:33:43Z

tests/kernel/timer/timer_api/src/main.c

-		TIMER_ASSERT(interval >= DURATION, timer);
+		TIMER_ASSERT((interval >= DURATION)
+			     || (INEXACT_MS_CONVERT
+				 && (interval == DURATION - 1)), timer);


The inexact version is a tighter bound (looking for an exact interval value) than the old one (make sure it's higher than or equal to duration). That seems wrong. It's OK to add slop where needed, but we should be testing the same behavior on all platforms.

I don't understand this statement. If the original test passes nothing changes. If it fails, then if we know we might have an error in converting ms to ticks, and in that case we relax the lower bound to account for a tick that wasn't added because accumulated error exceeded a tick.

andyross · 2020-05-22T15:36:13Z

tests/kernel/timer/timer_api/src/main.c

-				&periodicity_timer);
+		TIMER_ASSERT(WITHIN_ERROR(delta, period_ms, 1)
+			     || (INEXACT_MS_CONVERT
+				 && (delta == period_ms - 1)),


Similar deal: it used to allow slop, now it doesn't. We should do the same thing with a selectable offset so we don't end up accidentally testing different behavior on different systems.

andyross · 2020-05-22T15:39:21Z

tests/kernel/timer/timer_api/src/main.c

+	 * or smaller than expected, so relax the tolerance.  3 ticks
+	 * variance has been observed on hardware where the busywait
+	 * clock is 0.7% slow and the 32 KiHz tick clock is 60 ppm
+	 * fast relative to a stable external clock.


Hm... what this does is basically change the Zephyr tested promise that the busy wait and system clock times are compatible. I think if we're going to do this as policy it needs to be spelled out in the docs explicitly somewhere, not just silently waved away in a test.

carlescufi · 2020-05-22T17:07:34Z

@andyross Perhaps we could discuss the relationship between the system timer and k_busy_wait() also in the context of #24784, and settle this as part of this release. I added a recent comment there:
#24784 (comment)

andyross · 2020-05-22T17:11:59Z

Honestly I'd be happy with a documentation fix. Right now the clear implication of the way k_timer and k_busy_wait are defined is that they work in the same units. If they're allowed to be different, we should say they can be different (and maybe for extra credit have a framework for characterizing that skew, etc...).

But we can't just patch a test. Real users are not unlikely to make the same assumption the test author did.

carlescufi · 2020-05-22T17:14:14Z

Honestly I'd be happy with a documentation fix. Right now the clear implication of the way k_timer and k_busy_wait are defined is that they work in the same units. If they're allowed to be different, we should say they can be different (and maybe for extra credit have a framework for characterizing that skew, etc...).

But we can't just patch a test. Real users are not unlikely to make the same assumption the test author did.

I agree. @pabigot how about proposing a documentation patch in this very PR, and deal with #24784 at the same time?

Some tests assumed that a delay with k_busy_wait() could be precisely timed by a tick or cycle count, and vice versa. This is not true for all clock configurations on all Zephyr targets, so highlight the potential variation. As an example: on some platforms busy-wait is performed by a loop with known duration in cycles of a known CPU frequency. The CPU clock may be unrelated to the clock that drives the system timer. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

When one millisecond is not an integral number of ticks measuring durations between tick events will sometimes be less than expected to correct for error that was accumulated between other events. Allow for that in the duration and period comparisons. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

A fast timer clock can advance before or after the remaining time until an event is captured. Verify the expected relationship between current and remaining time holds for at least one captured current time. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

When the timer frequency is not a multiple of 1000 converting between ticks and milliseconds introduces error. Avoid propagating the error by converting divided values rather than dividing converted values. Also compensate for observed rate differences between the busywait clock and the timeout clock. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

pabigot requested review from andrewboie, andyross and nashif as code owners May 21, 2020 22:53

zephyrbot added area: Tests Issues related to a particular existing or missing test area: Kernel labels May 21, 2020

pabigot force-pushed the nordic/20200521d branch from 78773a9 to 31775e8 Compare May 21, 2020 22:58

andyross requested changes May 22, 2020

View reviewed changes

pabigot added 4 commits May 22, 2020 15:12

pabigot force-pushed the nordic/20200521d branch from 31775e8 to ae90254 Compare May 22, 2020 20:12

andyross approved these changes May 22, 2020

View reviewed changes

carlescufi approved these changes May 24, 2020

View reviewed changes

carlescufi requested a review from nordic-krch May 24, 2020 18:28

carlescufi merged commit 8fbc83c into zephyrproject-rtos:master May 24, 2020

pabigot deleted the nordic/20200521d branch May 26, 2020 12:42

pabigot mentioned this pull request Jun 8, 2020

tests: kernel: timer: timer_api: regression on STM32 boards #26039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539

kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539

pabigot commented May 21, 2020 •

edited

Loading

zephyrbot commented May 21, 2020 •

edited

Loading

pabigot commented May 22, 2020

andyross left a comment

andyross May 22, 2020

pabigot May 22, 2020

andyross May 22, 2020

pabigot May 22, 2020

andyross May 22, 2020

andyross May 22, 2020

pabigot May 22, 2020

carlescufi commented May 22, 2020

andyross commented May 22, 2020

carlescufi commented May 22, 2020

kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539

kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539

Conversation

pabigot commented May 21, 2020 • edited Loading

zephyrbot commented May 21, 2020 • edited Loading

pabigot commented May 22, 2020

andyross left a comment

Choose a reason for hiding this comment

andyross May 22, 2020

Choose a reason for hiding this comment

pabigot May 22, 2020

Choose a reason for hiding this comment

andyross May 22, 2020

Choose a reason for hiding this comment

pabigot May 22, 2020

Choose a reason for hiding this comment

andyross May 22, 2020

Choose a reason for hiding this comment

andyross May 22, 2020

Choose a reason for hiding this comment

pabigot May 22, 2020

Choose a reason for hiding this comment

carlescufi commented May 22, 2020

andyross commented May 22, 2020

carlescufi commented May 22, 2020

pabigot commented May 21, 2020 •

edited

Loading

zephyrbot commented May 21, 2020 •

edited

Loading