-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539
kernel: timer_api: more tweaks for inexact conversions and poor clock tolerance #25539
Conversation
All checks passed. Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
This passed 2160 consecutive runs before failing with:
on a board that had a fairly accurate (< 1000 ppm error) HFINT (64 MHz internal oscillator). On another board with -7200 ppm HFINT error and +61 ppm LFXO error the tick error on that test is -13 (expected tick duration is 1640) consistently. I don't see any point in increasing the threshold higher: some boards will simply be out of tolerance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks. The only issue I feel strongly about is that we can't change documented behavior via a tweak to a test. If the clocks are now allowed to be skewed by policy now, then we should document that in the k_busy_wait() definition.
@@ -18,6 +19,7 @@ struct timer_data { | |||
#define EXPIRE_TIMES 4 | |||
#define WITHIN_ERROR(var, target, epsilon) \ | |||
(((var) >= (target)) && ((var) <= (target) + (epsilon))) | |||
#define INEXACT_MS_CONVERT (k_ticks_to_ms_ceil64(k_ms_to_ticks_ceil64(1)) != 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a little more robust if you checked that they weren't divisible directly. Something like:
#define INEXACT_MS_CONVERT \
(MAX(CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, \
CONFIG_SYS_CLOCK_TICKS_PER_SEC) % \
MIN(CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, \
CONFIG_SYS_CLOCK_TICKS_PER_SEC) == 0) \
Honestly I had to think really hard to figure out if that double conversion was guaranteed to detect the case you want, and I'm still not completely convinced it does for all non-divisible rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that does at all the same thing; cycles per second isn't relevant.
We need to know that for 1 ms we get an integral number of ticks. If we get an integral number of ticks, then when we convert that factor back to ms it'll be the same value we started with.
For the original approach think of it as: the ceil operation will increase the value only if there's a remainder for the corresponding scaling. The only way you put in 1, convert, reverse convert, and get 1 back is if neither conversion produced a remainder.
But the original condition was stronger than it needed to be, at least for CYCLES_PER_SEC >= 1000, so the required condition can be expressed more simply which I've done.
I've also tried to explain why this is needed.
TIMER_ASSERT(interval >= DURATION, timer); | ||
TIMER_ASSERT((interval >= DURATION) | ||
|| (INEXACT_MS_CONVERT | ||
&& (interval == DURATION - 1)), timer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inexact version is a tighter bound (looking for an exact interval value) than the old one (make sure it's higher than or equal to duration). That seems wrong. It's OK to add slop where needed, but we should be testing the same behavior on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this statement. If the original test passes nothing changes. If it fails, then if we know we might have an error in converting ms to ticks, and in that case we relax the lower bound to account for a tick that wasn't added because accumulated error exceeded a tick.
&periodicity_timer); | ||
TIMER_ASSERT(WITHIN_ERROR(delta, period_ms, 1) | ||
|| (INEXACT_MS_CONVERT | ||
&& (delta == period_ms - 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar deal: it used to allow slop, now it doesn't. We should do the same thing with a selectable offset so we don't end up accidentally testing different behavior on different systems.
* or smaller than expected, so relax the tolerance. 3 ticks | ||
* variance has been observed on hardware where the busywait | ||
* clock is 0.7% slow and the 32 KiHz tick clock is 60 ppm | ||
* fast relative to a stable external clock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... what this does is basically change the Zephyr tested promise that the busy wait and system clock times are compatible. I think if we're going to do this as policy it needs to be spelled out in the docs explicitly somewhere, not just silently waved away in a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@andyross Perhaps we could discuss the relationship between the system timer and |
Honestly I'd be happy with a documentation fix. Right now the clear implication of the way k_timer and k_busy_wait are defined is that they work in the same units. If they're allowed to be different, we should say they can be different (and maybe for extra credit have a framework for characterizing that skew, etc...). But we can't just patch a test. Real users are not unlikely to make the same assumption the test author did. |
I agree. @pabigot how about proposing a documentation patch in this very PR, and deal with #24784 at the same time? |
Some tests assumed that a delay with k_busy_wait() could be precisely timed by a tick or cycle count, and vice versa. This is not true for all clock configurations on all Zephyr targets, so highlight the potential variation. As an example: on some platforms busy-wait is performed by a loop with known duration in cycles of a known CPU frequency. The CPU clock may be unrelated to the clock that drives the system timer. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
When one millisecond is not an integral number of ticks measuring durations between tick events will sometimes be less than expected to correct for error that was accumulated between other events. Allow for that in the duration and period comparisons. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
A fast timer clock can advance before or after the remaining time until an event is captured. Verify the expected relationship between current and remaining time holds for at least one captured current time. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
When the timer frequency is not a multiple of 1000 converting between ticks and milliseconds introduces error. Avoid propagating the error by converting divided values rather than dividing converted values. Also compensate for observed rate differences between the busywait clock and the timeout clock. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
A couple failures observed in #24001 are a consequence of precision loss converting between milliseconds and 32 KiHz ticks, compounded by clock rate variation. Other failures are due to skew between the clock used for busywait and the clock used for system time. Also asynchronous updates to the tick counter can occur before or after the value it's compared with has been captured.
In short, tweak the test until it runs reliably on problematic hardware identified in #24001 (where "reliably" means >100 consecutive runs without failure using
CONFIG_ZTEST_RETEST_IF_PASSED=y
).Fixes #24001
Fixes #24784 through a documentation fix clarifying that
k_busy_wait()
delays are not necessarily aligned to sysclock.