Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys: xtimer: fix some race conditions #4903

Merged
merged 1 commit into from
Mar 11, 2016
Merged

Conversation

kaspar030
Copy link
Contributor

This PR (hopefully) fixes two issues:
#4902:

previously, _xtimer_set_absolute() would determine the current _long_cnt outside of a critical section, then determine using _this_high_period() whether the timer shoots in current or next period. If the overflow interrupt occurs in between, long_target is too low (and thus not in _this_high_period()), causing the timer to wrongly be added to the long list. This PR moves the critical section over initial _long_cnt determination.
#4841:

Previously, xtimer_set() would remove a timer outside of a critical section, then call _xtimer_set_absolute(). If the same timer would be set in between, it could end up twice in a timer list, causing xtimer to hang.
This PR refactors xtimer_remove into a safe and unsafe variant and adds a second check using the unsafe variant in _xtimer_set_absolute()'s critical section.

@kaspar030 kaspar030 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Area: timers Area: timer subsystems labels Feb 25, 2016
@kaspar030 kaspar030 mentioned this pull request Feb 25, 2016
@kaspar030
Copy link
Contributor Author

@immesys Could you give this a try?

@immesys
Copy link
Contributor

immesys commented Feb 25, 2016

@kaspar030 sure. I will run it overnight and let you know.

@kaspar030
Copy link
Contributor Author

There's one thing this PR doesn't fix: the same race as @immesys, for spinning timers.

I didn't know what would be the best semantic:

  1. after spinning, shoot the currently set callback of the timer, which might have been overridden on timer re-set (current state, probably not the expected behaviour)
  2. check at every spin cycle if the timer has been reset (inefficient)
  3. copy the timer object, use that (different semantics than without spinning)
  4. check at the end of spinning if the timer has been reset, possibly aborting (might spin and then do nothing)

I tend to go with 4.), as it is easy to implement and probably leads most to expected behaviour.

@kaspar030
Copy link
Contributor Author

Thanks @immesys!

@jnohlgard
Copy link
Member

@immesys @kaspar030 what's the status here?

Needs rebase btw

@immesys
Copy link
Contributor

immesys commented Mar 9, 2016

About to do some more testing of this, but @kaspar030 it needs some rebasing.

@kaspar030
Copy link
Contributor Author

  • rebased

@immesys
Copy link
Contributor

immesys commented Mar 9, 2016

Sweet!

@jnohlgard
Copy link
Member

ACK for the change in general, waiting for @immesys report from their testing

@jnohlgard jnohlgard added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Mar 9, 2016
@immesys
Copy link
Contributor

immesys commented Mar 9, 2016

I actually hit some timer problems today, but as I was hacking on something else, it might have been an unrelated newly introduced user error. Can you give me tomorrow to test it specifically? Ideally I might also be able to contribute a test case.

@kaspar030
Copy link
Contributor Author

@immesys All the time you need! ;)

@miri64
Copy link
Member

miri64 commented Mar 11, 2016

Any progress?

@immesys
Copy link
Contributor

immesys commented Mar 11, 2016

Huh, I commented on this, maybe it was on the issue not the PR. I think this looks good and should be merged. If I track down why I was still getting a lockup in _remove then I will open a new issue but it is very rare and might be the debugger interfering with things.

@miri64
Copy link
Member

miri64 commented Mar 11, 2016

👍 then go.

miri64 added a commit that referenced this pull request Mar 11, 2016
sys: xtimer: fix some race conditions
@miri64 miri64 merged commit b3b880b into RIOT-OS:master Mar 11, 2016
@miri64
Copy link
Member

miri64 commented Mar 11, 2016

(sorry just saw that this passed and was ACKed. Did not track the issues to that)

@kaspar030 kaspar030 deleted the xtimer_fixes branch March 11, 2016 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants