Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/vtimer: Fix two vtimer issues (hwtimer tick conversion). #2515

Merged
merged 2 commits into from
Jun 1, 2015

Conversation

lightblu
Copy link

@lightblu lightblu commented Mar 2, 2015

I am using a K22 port based on the open K60/mulle port pull request (great work, thanks!)
A little blinking application that uses vtimer_usleep and toggles a LED every second fails to schedule the timer properly after ~1h09.

First I blamed the K60 port or my quick K22 port, but turns out the different timer overflows in vtimer are not handled properly.Maybe I am totally wrong as I have no complete understanding of everything going on there, but I found at least two occurences that do not look right:

in update_shortterm: HWTIMER_TICKS cannot be just applied to next, this will be wrong when next overflows.
in vtimer_now: wrong parentheses mix up vtimer and hwtimer ticks, goes wrong as soon as the first vtimer longterm tick happens and longterm_tick_start becomes nonzero.
Correcting these makes the system run nicely through the first vtimer longterm tick and the first 32bit us overflow, however a long running test (>6hours over night) resulted in a RIOT crash at some later point in time, rerunning this test currently.
Btw, I doubt that this will run through the hwtimer overflow nicely, but this would happen after 36 hours (for 32768 kHz).

I wonder how this could go unnoticed that long, does RIOT usually run with HWTIMER == 1MHz only (it is 32768 kHz here)?
As a newbie, are vtimers kind of optional and I should use something else?

This may also be related to #2435 and/or #1753, however timings (at least the first ticket) are different.

Commit message:
vtimer does not handle well the different timers (vtimer <-> hwtimer)
with regard to their overflows:

  • in update_shortterm HWTIMER_TICKS cannot be just applied to next, this will be wrong when next overflows.
  • in vtimer_now wrong parentheses mix up vtimer and hwtimer ticks.

Issue:

Maybe related issues:

vtimer does not handle well the different timers (vtimer <-> hwtimer)
with regard  to their overflows:

* in update_shortterm HWTIMER_TICKS cannot be just applied to next, this will be wrong when next overflows.
* in vtimer_now wrong parentheses mix up vtimer and hwtimer ticks.

Maybe related issues:
* RIOT-OS#2435
* RIOT-OS#1753
@jnohlgard
Copy link
Member

I wonder how this could go unnoticed that long, does RIOT usually run with HWTIMER == 1MHz only (it is 32768 kHz here)?

I believe all platforms are running with hwtimer frequency == 1 MHz, except for Kinetis and MSP430, which is why we have not been seeing this before.

As a newbie, are vtimers kind of optional and I should use something else?

The general opinion is that the vtimer module have a lot of faults, and a new timer implementation is on the road map, but currently not started. A timer task force has been mentioned on the mailing lists, but I don't think there has been any work done there yet.
In short: The vtimer sucks, but we have not yet implemented an alternative.

@jnohlgard
Copy link
Member

@lightblu Thank you for investigating this and creating this PR.

Could you try leaving your test running with a debugger connected so that you can get a backtrace from the crash?
GDB will break execution in the default hardfault handler if DEVELHELP is set. (see crash.c)

@@ -286,7 +288,7 @@ static int vtimer_set(vtimer_t *timer)

void vtimer_now(timex_t *out)
{
uint32_t us = HWTIMER_TICKS_TO_US(hwtimer_now() - longterm_tick_start);
uint32_t us = HWTIMER_TICKS_TO_US(hwtimer_now()) - longterm_tick_start;
uint32_t us_per_s = 1000ul * 1000ul;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to static const uint32_t us_per_s = 1000ul * 1000ul; while you're poking around here?

@lightblu
Copy link
Author

lightblu commented Mar 2, 2015

The mentioned RIOT crash maybe just has happened because I tried to reattach the debugger this morning after closing the laptop for the night.

My latest test run stopped after 7992 seconds and seems to be a different kind of error.

@kaspar030
Copy link
Contributor

Thanks for debugging this!

Here's a wiki page collecting thoughts/ideas/facts on how to fix our timer issues once and for all:
https://github.com/RIOT-OS/RIOT/wiki/Timer-Task-Force

Please have a look and share your thoughts!

IMHO we should drop vtimer (and maybe hwtimer) and rewrite it.

@OlegHahm OlegHahm added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Mar 3, 2015
Before, the longterm_tick_timer had special handling in update_shortterm.
This approach was bad because the longterm_tick_timer's shooting microseconds time
had different semantics like the rest and thus it could end up in a blocking
position in the priority queue at some point in time, although it should
get executed at another point in time.

Made the longterm_tick_timer handling / meaning of its microseconds the same
as the other timers and also removed seconds, because it is now the same
as longertm_tick_timer.absolute.seconds.

See also RIOT-OS#2515
@lightblu
Copy link
Author

lightblu commented Mar 4, 2015

The controller scheduling failing at 7992/7993 was reproducible (and btw, I have not observed the mentioned RIOT crash again).
The reason was that the longterm_timer_tick microseconds had a different meaning / offset in the priority queue as the other timers. Device now ran successfully through second 7993, and also the second longterm_timer_tick at 8192 .. looks pretty good now from my side, but someone really needs to verify and maybe rethink the whole design as you mentioned.

If time permits I will put it somewhere to see what happens when the 32768 kHz timer overflows (>36h, but this would be a different story anyway).

@lightblu
Copy link
Author

lightblu commented Mar 6, 2015

Device ran successful through the 32768 kHz 32bit hwtimer overflow 36 hour mark and the 1 second scheduling stayed accurate (as accurate as vtimers are).

@OlegHahm OlegHahm force-pushed the master branch 3 times, most recently from 9f184dd to 45554bf Compare March 31, 2015 13:01
@PeterKietzmann
Copy link
Member

@lightblu, @kaspar030 is it reasonable to merge this PR for now even if the "timer task force" will rework some thins?

@kaspar030
Copy link
Contributor

Sure, don't have time to review though...

@OlegHahm
Copy link
Member

OlegHahm commented May 8, 2015

@kaspar030, ping?

@OlegHahm
Copy link
Member

I haven't tested yet, but I would argue to merge this PR independent from the timer task force. Code looks good.

@jnohlgard
Copy link
Member

This could be the problem I am seeing in #3131 as well.

@jnohlgard
Copy link
Member

I ran this on a test yesterday and also got past the initial problem at one hour. But also got to the same error as @lightblu, there was a lock up at 7696 seconds wall clock time, 7567 hwtimer seconds. I wasn't looking when it happened so it could have been just a problem with the debugger too, I will run another test today and also see if I have the time to refactor the hwtimer driver for Kinetis because I believe the current implementation is losing too many ticks (129 second difference between the wall clock and the timer tick after only 2 hours is not acceptable IMO, almost 1 second per 60 seconds).

This PR fixes a major bug in the vtimer implementation and should be merged independently of the other timer work. The comment by @OlegHahm and me is not crucial, it is better to get this merged and have a working vtimer and do a follow up PR to remove the set_absolute call from vtimer.c.

ACK

@OlegHahm
Copy link
Member

OlegHahm commented Jun 1, 2015

So, should be push the button?

@jnohlgard
Copy link
Member

@OlegHahm you have my ACK

I will write a follow up PR for replacing set_absolute with a set(_relative) call.

jnohlgard pushed a commit that referenced this pull request Jun 1, 2015
sys/vtimer: Fix two vtimer issues (hwtimer tick conversion).
@jnohlgard jnohlgard merged commit d804969 into RIOT-OS:master Jun 1, 2015
daniel-k pushed a commit to daniel-k/RIOT that referenced this pull request Jun 1, 2015
Before, the longterm_tick_timer had special handling in update_shortterm.
This approach was bad because the longterm_tick_timer's shooting microseconds time
had different semantics like the rest and thus it could end up in a blocking
position in the priority queue at some point in time, although it should
get executed at another point in time.

Made the longterm_tick_timer handling / meaning of its microseconds the same
as the other timers and also removed seconds, because it is now the same
as longertm_tick_timer.absolute.seconds.

See also RIOT-OS#2515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants