-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/vtimer: Fix two vtimer issues (hwtimer tick conversion). #2515
Conversation
vtimer does not handle well the different timers (vtimer <-> hwtimer) with regard to their overflows: * in update_shortterm HWTIMER_TICKS cannot be just applied to next, this will be wrong when next overflows. * in vtimer_now wrong parentheses mix up vtimer and hwtimer ticks. Maybe related issues: * RIOT-OS#2435 * RIOT-OS#1753
I believe all platforms are running with hwtimer frequency == 1 MHz, except for Kinetis and MSP430, which is why we have not been seeing this before.
The general opinion is that the vtimer module have a lot of faults, and a new timer implementation is on the road map, but currently not started. A timer task force has been mentioned on the mailing lists, but I don't think there has been any work done there yet. |
@lightblu Thank you for investigating this and creating this PR. Could you try leaving your test running with a debugger connected so that you can get a backtrace from the crash? |
@@ -286,7 +288,7 @@ static int vtimer_set(vtimer_t *timer) | |||
|
|||
void vtimer_now(timex_t *out) | |||
{ | |||
uint32_t us = HWTIMER_TICKS_TO_US(hwtimer_now() - longterm_tick_start); | |||
uint32_t us = HWTIMER_TICKS_TO_US(hwtimer_now()) - longterm_tick_start; | |||
uint32_t us_per_s = 1000ul * 1000ul; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to static const uint32_t us_per_s = 1000ul * 1000ul;
while you're poking around here?
The mentioned RIOT crash maybe just has happened because I tried to reattach the debugger this morning after closing the laptop for the night. My latest test run stopped after 7992 seconds and seems to be a different kind of error. |
Thanks for debugging this! Here's a wiki page collecting thoughts/ideas/facts on how to fix our timer issues once and for all: Please have a look and share your thoughts! IMHO we should drop vtimer (and maybe hwtimer) and rewrite it. |
Before, the longterm_tick_timer had special handling in update_shortterm. This approach was bad because the longterm_tick_timer's shooting microseconds time had different semantics like the rest and thus it could end up in a blocking position in the priority queue at some point in time, although it should get executed at another point in time. Made the longterm_tick_timer handling / meaning of its microseconds the same as the other timers and also removed seconds, because it is now the same as longertm_tick_timer.absolute.seconds. See also RIOT-OS#2515
The controller scheduling failing at 7992/7993 was reproducible (and btw, I have not observed the mentioned RIOT crash again). If time permits I will put it somewhere to see what happens when the 32768 kHz timer overflows (>36h, but this would be a different story anyway). |
Device ran successful through the 32768 kHz 32bit hwtimer overflow 36 hour mark and the 1 second scheduling stayed accurate (as accurate as vtimers are). |
9f184dd
to
45554bf
Compare
@lightblu, @kaspar030 is it reasonable to merge this PR for now even if the "timer task force" will rework some thins? |
Sure, don't have time to review though... |
@kaspar030, ping? |
I haven't tested yet, but I would argue to merge this PR independent from the timer task force. Code looks good. |
This could be the problem I am seeing in #3131 as well. |
I ran this on a test yesterday and also got past the initial problem at one hour. But also got to the same error as @lightblu, there was a lock up at 7696 seconds wall clock time, 7567 hwtimer seconds. I wasn't looking when it happened so it could have been just a problem with the debugger too, I will run another test today and also see if I have the time to refactor the hwtimer driver for Kinetis because I believe the current implementation is losing too many ticks (129 second difference between the wall clock and the timer tick after only 2 hours is not acceptable IMO, almost 1 second per 60 seconds). This PR fixes a major bug in the vtimer implementation and should be merged independently of the other timer work. The comment by @OlegHahm and me is not crucial, it is better to get this merged and have a working vtimer and do a follow up PR to remove the set_absolute call from vtimer.c. ACK |
So, should be push the button? |
@OlegHahm you have my ACK I will write a follow up PR for replacing set_absolute with a set(_relative) call. |
sys/vtimer: Fix two vtimer issues (hwtimer tick conversion).
Before, the longterm_tick_timer had special handling in update_shortterm. This approach was bad because the longterm_tick_timer's shooting microseconds time had different semantics like the rest and thus it could end up in a blocking position in the priority queue at some point in time, although it should get executed at another point in time. Made the longterm_tick_timer handling / meaning of its microseconds the same as the other timers and also removed seconds, because it is now the same as longertm_tick_timer.absolute.seconds. See also RIOT-OS#2515
I am using a K22 port based on the open K60/mulle port pull request (great work, thanks!)
A little blinking application that uses vtimer_usleep and toggles a LED every second fails to schedule the timer properly after ~1h09.
First I blamed the K60 port or my quick K22 port, but turns out the different timer overflows in vtimer are not handled properly.Maybe I am totally wrong as I have no complete understanding of everything going on there, but I found at least two occurences that do not look right:
in update_shortterm: HWTIMER_TICKS cannot be just applied to next, this will be wrong when next overflows.
in vtimer_now: wrong parentheses mix up vtimer and hwtimer ticks, goes wrong as soon as the first vtimer longterm tick happens and longterm_tick_start becomes nonzero.
Correcting these makes the system run nicely through the first vtimer longterm tick and the first 32bit us overflow, however a long running test (>6hours over night) resulted in a RIOT crash at some later point in time, rerunning this test currently.
Btw, I doubt that this will run through the hwtimer overflow nicely, but this would happen after 36 hours (for 32768 kHz).
I wonder how this could go unnoticed that long, does RIOT usually run with HWTIMER == 1MHz only (it is 32768 kHz here)?
As a newbie, are vtimers kind of optional and I should use something else?
This may also be related to #2435 and/or #1753, however timings (at least the first ticket) are different.
Commit message:
vtimer does not handle well the different timers (vtimer <-> hwtimer)
with regard to their overflows:
Issue:
Maybe related issues: