Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtimer_periodic_wakeup is not interrupt safe #8388

Closed
Winkelkatze opened this issue Jan 20, 2018 · 9 comments
Closed

xtimer_periodic_wakeup is not interrupt safe #8388

Winkelkatze opened this issue Jan 20, 2018 · 9 comments
Assignees
Labels
Area: timers Area: timer subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@Winkelkatze
Copy link
Contributor

Winkelkatze commented Jan 20, 2018

Description

If an interrupt happens at the wrong time during a call to xtimer_periodic_wakeup with a short delay it may cause an internal overflow in the timer which results in the delay to be much longer than specified.

Steps to reproduce the issue

Reproducing this works best if the xtimer is built with ENABLE_DEBUG 1 and the task calling xtimer_periodic_wakeup has a low priority. (But it also happens without debug and at higher priorities (if the delay is short enough or too much is done in the interrupt))

To reproduce this, we need a low priority thread calling xtimer_periodic_wakeup with a short delay in an infinite loop. An other thread with a higher priority is waiting for a message that is sent in an interrupt. If this thread receives the message, it prints a not too small text on stdout. Important is that sending the text should take longer than the delay in xtimer_periodic_wakeup.

After a few interrupts the thread calling xtimer_periodic_wakeup is stuck.

Explanation

Looking at the xtimer implementation, it is easy to see the actual problem:
The interrupt needs to happen in _xtimer_periodic_wakeup after reading 'now' and before calling '_xtimer_set_absolute'. (This also explains why it works much better if debug is active, because the debug print takes some time which increases the chance of the interrupt happening in this range)
The interrupt (or sending the message from within the interrupt) will cause the scheduler to run again which results in execution of the higher priority thread that is waiting for that message. When this thread yields again and the thread calling _xtimer_periodic_wakeup is resumed, more time than specified in delay has elapsed. _xtimer_set_absolute is called anyway and gets the wakeup target as parameter. It will then recalculate the difference between the target and the current time (which is now larger than target). This causes the _xtimer_set_absolute to detect this overflow and increment timer->long_target. This way the delay will be much longer than specified.

References

This may be related to the PRs: #7628 and #5428

Versions

Tested with the current git version of RIOT on STM32F103x8.

Installed toolchain versions

native gcc: gcc (GCC) 7.2.1 20171224
msp430-gcc: missing/error
avr-gcc: missing/error
arm-none-eabi-gcc: arm-none-eabi-gcc (Arch Repository) 7.2.0
ips-mti-elf-gcc: missing/error
clang: clang version 5.0.1 (tags/RELEASE_501/final)
arm-none-eabi-newlib: "2.5.0"
mips-mti-elf-newlib: missing/error
avr-libc: missing/error (missing/error)
cppcheck: missing
coccinelle: missing
git: git version 2.15.1

@PeterKietzmann PeterKietzmann added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems labels Feb 26, 2018
@kYc0o
Copy link
Contributor

kYc0o commented Jul 19, 2018

@kaspar030 @Josar could you take a look on this?

@Josar
Copy link
Contributor

Josar commented Jul 20, 2018

Sound as this could happen.
Can be fixed by disabling interrupts before reading _xtimer_now() for comparison in _xtimer_periodic_wakeup and also add interrupt_State as parameter to _xtimer_set_absolute(). Then restore when spinnign and at the end of _xtimer_set_absolute().
Same is valid for _xtimer_set.
imho.

@kYc0o
Copy link
Contributor

kYc0o commented Jul 20, 2018

Let's try to fix it for the next release.

@kYc0o kYc0o added this to the Release 2018.10 milestone Jul 20, 2018
@Winkelkatze
Copy link
Contributor Author

@Josar I'm not sure if disabling the interrupts before / while reading _xtimer_now() is a good idea. The current xtimer value (_xtimer_now()) comes from a hardware timer that increments an overflow counter (_xtimer_high_cnt) in it's interrupt. The _xtimer_now() function reads both values and always checks, if it has gotten a valid pair of timer value and overflow counter. However, if you disable the interrupts before calling it, this check no longer works and you may end up with an overflown timer value but the overflow counter still at the old value.
This causes that of two consecutive calls to _xtimer_now() the second call could return a smaller value than the first call without having an overflow in the uint32 return value. I'm not sure, if this is a problem, but it should be considered.

@Josar
Copy link
Contributor

Josar commented Jul 22, 2018

@Icaltary yes thats right. I think this can be solved after #8990 is merged. And disabling interrupts after _xtimer_now() in _xtimer_set_absolute() should then suffice.

@fjmolinas
Copy link
Contributor

@MichelRottleuthner @Hyungsin was this fixed in #9530?

@MichelRottleuthner
Copy link
Contributor

Yes.

@fjmolinas
Copy link
Contributor

Is that also the cas for #5338? Could you take a look at the issues that where fixed with #9530?

@fjmolinas
Copy link
Contributor

Closed since #9530 got in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: timers Area: timer subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests