-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unstable tests involving nRF system timer #35575
Conversation
Correct the way the relative ticks value is calculated for an absolute timeout. Previously, elapsed() was called twice and the returned value was first subtracted from and then added to the ticks value. It could happen that the HW counter value read by elapsed() changed between the two calls to this function. This caused the test_timeout_abs test case from the timer_api test suite to occasionally fail, e.g. on certain nRF platforms. Signed-off-by: Andrzej Głąbek <andrzej.glabek@nordicsemi.no>
As per description of the sys_clock_elapsed() function, "the kernel will call this with appropriate locking, the driver needs only provide an instantaneous answer". Remove then the unnecessary locking from the function, as it only adds an undesirable delay. Signed-off-by: Andrzej Głąbek <andrzej.glabek@nordicsemi.no>
fe93ff7
to
5221bb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix looks great. Optimization seems wrong, unless I missed something.
|
||
if (IS_ENABLED(CONFIG_TIMEOUT_64BIT) && Z_TICK_ABS(ticks) >= 0) { | ||
ticks = Z_TICK_ABS(timeout.ticks) - (curr_tick + elapsed()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Indeed, this computation should always have happened inside the lock, anyway.
k_spinlock_key_t key = k_spin_lock(&lock); | ||
uint32_t ret = counter_sub(counter(), last_count) / CYC_PER_TICK; | ||
|
||
k_spin_unlock(&lock, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong. That comment isn't saying you don't need synchronization, it's saying that this function isn't reentrant: you don't need to worry about other users of the timeout system asking for the current time simultaneously.
In fact that last_count variable is mutated from an ISR, so if you aren't locking interrupts this can race, e.g.:
- Call
counter()
, which returns a time 1 cycle before the expiration value "T
" - The counter advances and the interrupt preempts, then advances last_count to be equal to
T
- The ISR returns and this subtracts last_count (now "
T
") from counter()'s result from earlier ("T - 1
"), giving0xffffff
- The world blows up due to the overflowed value
Maybe I should remove that sentence in the docs.
Ah, but this is a single-CPU system and in fact all existing usage (after the fix in the first patch here) is done with a spinlock held and interrupts masked. I'm not completely sure that's safe to document, but this isn't actually a bug as it stands. Might be worth a comment explaining, and maybe an assertion to check that interrupts are masked, but I don't see why this can't go in. I'll remove my -1. |
Fixes #35509.
kernel: timeout: Fix adding of an absolute timeout
Correct the way the relative ticks value is calculated for an absolute
timeout. Previously, elapsed() was called twice and the returned value
was first subtracted from and then added to the ticks value. It could
happen that the HW counter value read by elapsed() changed between the
two calls to this function. This caused the test_timeout_abs test case
from the timer_api test suite to occasionally fail, e.g. on certain nRF
platforms.
drivers: nrf_rtc_timer: Remove unnecessary locking
As per description of the sys_clock_elapsed() function, "the kernel
will call this with appropriate locking, the driver needs only provide
an instantaneous answer". Remove then the unnecessary locking from the
function, as it only adds an undesirable delay.
The above delay combined with the disabled instruction cache (the issue
fixed initially by #35455, then by #35510) caused the test_posix_realtime
test case from the posix_apis test suite to fail on some nRF platforms
(see #35509).