Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cladmi
Copy link
Contributor

@cladmi cladmi commented Nov 2, 2018

Contribution description

This is more an issue with code than a pull request

The newly added documentation by #9211 explains the requirement of _xtimer_set_absolute, which is great as they are now explicit.

But the requirement are impossible to guarantee in the current implementation.

I provide unittests using a mock highly wip implementation of periph_timer to show cases where the prerequisites and stated conditions are not met.

Idea for a solution

Ideas I see to handle this

  • Get the now from the caller for _xtimer_set_absolute and change it to take now + offset.
  • Got into irq_disable before calling _xtimer_now in the function
  • Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements
    There is a lot to say on this one and it would require specific definition of how long a process is allowed to be de-scheduled. Right now, if a process is de-scheduled for more than (1 << XTIMER_WITDH) / XTIMER_HZ it would not be detected by this function. This is 65ms on the iotlab-m3 board, and it may even require stricter constraints.

De-scheduled in _xtimer_set

    /* Ensure that offset is bigger than 'XTIMER_BACKOFF',
     * 'target - now' will always be the offset no matter if target < or > now.
     *
     * This expects that target was not set too close to now and overrun now, so
     * from setting target up until the call of '_xtimer_now()' above now has not
     * become equal or bigger than target.
     * This is crucial when using low CPU frequencies so reaching the '_xtimer_now()'
     * call needs multiple xtimer ticks.
     *
     * '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already
     * backing off for small values. */

xtimer_core.c

As _xtimer_set is executed without masked interrupt, the thread could be de-scheduled for XTIMER_BACKOFF +1 ticks between these two lines:

        uint32_t target = _xtimer_now() + offset;
        _xtimer_set_absolute(timer, target);

uint32_t target = _xtimer_now() + offset;
_xtimer_set_absolute(timer, target);

Which could make _xtimer_set_absolute called with a time in the past.

So no _xtimer_set does not "ensure this".

Test provided for this one.

De-scheduled in _xtimer_set_absolute just after _xtimer_now

uint32_t now = _xtimer_now();
int res = 0;
timer->next = NULL;
/* Ensure that offset is bigger than 'XTIMER_BACKOFF',
* 'target - now' will allways be the offset no matter if target < or > now.
*
* This expects that target was not set too close to now and overrun now, so
* from setting target up until the call of '_xtimer_now()' above now has not
* become equal or bigger than target.
* This is crucial when using low CPU frequencies so reaching the '_xtimer_now()'
* call needs multiple xtimer ticks.
*
* '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already
* backing off for small values. */
uint32_t offset = (target - now);
DEBUG("timer_set_absolute(): now=%" PRIu32 " target=%" PRIu32 " offset=%" PRIu32 "\n",
now, target, offset);
if (offset <= XTIMER_BACKOFF) {
/* backoff */
xtimer_spin_until(target);

    uint32_t now = _xtimer_now();
    // ...
    uint32_t offset = (target - now);
    // ...
    if (offset <= XTIMER_BACKOFF) {
        /* backoff */
        xtimer_spin_until(target);

The thread could be de-scheduled for XTIMER_BACKOFF +1 before calling xtimer_spin_until which would make it called with a time already elapsed, and so do a full loop around.

Test provided for this one.

De-scheduled in _xtimer_set_absolute before irq_disable

    /* Ensure timer is fired in right timer period.
     * Backoff condition above ensures that 'target - XTIMER_OVERHEAD` is later
     * than 'now', also for values when now will overflow and the value of target
     * is smaller then now.
     * If `target < XTIMER_OVERHEAD` the new target will be at the end of this
     * 32bit period, as `target - XTIMER_OVERHEAD` is a big number instead of a
     * small at the beginning of the next period. */

/* Ensure timer is fired in right timer period.
* Backoff condition above ensures that 'target - XTIMER_OVERHEAD` is later
* than 'now', also for values when now will overflow and the value of target
* is smaller then now.
* If `target < XTIMER_OVERHEAD` the new target will be at the end of this
* 32bit period, as `target - XTIMER_OVERHEAD` is a big number instead of a
* small at the beginning of the next period. */

This could be show by de-scheduling before irq_disable, but the test does not shows this for the moment.

Testing procedure

Run the example in tests/xtimer_mocked_periph_timer with a board using a 1Mhz timer for xtimer (test implementation limitation) I tested it with samr21-xpro or iotlab-m3.

A test is provided for the first two cases.

The implementation uses a WIP mocked periph_timer implementation to allow simulating descheduling at certain times by increasing the low level timer.
It is based on the number of call to _xtimer_now, so it breaks with DEBUG in xtimer.

Issues/PRs references

Mention of the real issues in this comment https://github.com/RIOT-OS/RIOT/pull/9211/files#r229728883

@cladmi cladmi added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Area: tests Area: tests and testing framework CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable Area: timers Area: timer subsystems labels Nov 2, 2018
@cladmi
Copy link
Contributor Author

cladmi commented Nov 2, 2018

@Josar I managed to show some issues by using mocks. As I started after your PR was merged, I focused on the remarks I did in the comments section of your PR. That is using your new documentation that I found all these issues so +1 for writing it :)
I hoped to have time to show and understand what you solved but it was already merged.

@MichelRottleuthner
Copy link
Contributor

Nice! Tests for stuff like that should have been there before. Thanks for providing this mock test.
In reply to your comment in #9211:

I tried to prove my assumptions with #10321 please show me if my tests are wrong to say my critics are wrong.

I think most of the involved people actually agreed that there are more problems that aren't fixed by the provided PR. The intention for moving forward quickly with this wasn't to ignore/disagree with remaining critics. AIUI the requirements for the _xtimer_set_absolute parameters didn't change - but they are now stated explicitly. Nonetheless you are absolutely right about these requirements being hard (or in some cases impossible) to ensure - and we need to address that.

Just a few comments:

Get the now from the caller for _xtimer_set_absolute and change it to take now + offset.

Sounds reasonable to hand over this information.

Got into irq_disable before calling _xtimer_now in the function

I'm not sure on what other problems this may create and how this would change the performance.

Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements (...)

What is "past" from the context of the spin call? From the function I'd expect that it spins until this value is reached - without handling stuff like that. In general, spin should only be used for very short periods of time where timer setup and callback overhead is too heavy i.e. cases where another thread being scheduled in between will likely break what the spin is used for anyway (e.g. timing sensitive bitbanging). For that reason I think that spinning (if impossible to avoid) should only be used with interrupts disabled - because otherwise it probably won't work for the intended use case anyway.


/*
* Set the internal timer without handling any callbacks
* Should fit the internal precisiong
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's still WIP, but there's a typo here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

… mock implementation

Change that the 'post' callback so it is done after 'timer_read' only.
@cladmi
Copy link
Contributor Author

cladmi commented Nov 5, 2018

Nice! Tests for stuff like that should have been there before. Thanks for providing this mock test.

I said I was writing it and provided a first skeleton, but it was merged before I could provide a test.

The principle is quite easy, you define macros to redirect function calls to the mocks and include the c file just after. So not adding USEMODULE. It allows mocking functions that are also used in the system and also access static functions and variables.

In reply to your comment in #9211:

I tried to prove my assumptions with #10321 please show me if my tests are wrong to say my critics are wrong.

I think most of the involved people actually agreed that there are more problems that aren't fixed by the provided PR. The intention for moving forward quickly with this wasn't to ignore/disagree with remaining critics. AIUI the requirements for the _xtimer_set_absolute parameters didn't change - but they are now stated explicitly. Nonetheless you are absolutely right about these requirements being hard (or in some cases impossible) to ensure - and we need to address that.

Yes but the documentation also says "this prerequisite is verified" when it is wrong.

In the same time, the requirements/behavior changed a little bit, as this line in the changeset, which did not solve the main bug, would make more easily target be in the past.

         /* backoff */
-        xtimer_spin_until(target + XTIMER_BACKOFF);
+        xtimer_spin_until(target);
         _shoot(timer);
         return 0;

https://github.com/RIOT-OS/RIOT/pull/9211/files#diff-5371c49e6244a42b30cd3a714c0e9aacR203

And all the changes, documentation, fix and unrelated were in a single commit.

Got into irq_disable before calling _xtimer_now in the function

I'm not sure on what other problems this may create and how this would change the performance.

I will try to show with tests too, but I will need to mock irq_disable for it I think and implement periph_timer callback.

The goal is to have a deterministic time between calling now and configuring the low level timer.
Because if you configure the low level timer too late then you are good for a full loop I think or even two as you do long_target++ and set with a target in the past for the low level timer, so in a full loop (I did not completely reviewed this yet so not sure).

The performance should only be that we spend the time taken to call xtimer_now with disabled interrupts.
And what is performance when you compare something that works with one that does not ?

They can be restored before doing xtimer_spin_until to keep the current behavior when spinning.

Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements (...)

What is "past" from the context of the spin call? From the function I'd expect that it spins until this value is reached - without handling stuff like that. In general, spin should only be used for very short periods of time where timer setup and callback overhead is too heavy i.e. cases where another thread being scheduled in between will likely break what the spin is used for anyway (e.g. timing sensitive bitbanging). For that reason I think that spinning (if impossible to avoid) should only be used with interrupts disabled - because otherwise it probably won't work for the intended use case anyway.

Spin is an until and not an offset, so it can be elapsed.

The test_show_xtimer_set_does_not_handle_overhead test shows the issue.
The base problem is with the xtimer_spin_until implementation, not with the concept.

As said there are many things with this one that are together and did not listed all of them yet.
But it can be called from a user thread or even one xtimer callback so in the timer interrupt.

I do not know how timing sensitive they are, but with two applications you could overshoot by the de-schedule time anyway if another application de-schedules you before you enter the function with offset, or even just after spinning so before you call _shoot. It would be the caller that should disable interrupts in that case as you can also always be de-scheduled before you call the function.

A de-scheduler thread could de-schedule you by XTIMER_BACKOFF by just calling xtimer_set with a low offset. With masked interrupts you make higher priority thread be late by XTIMER_BACKOFF so not sure its better than the low priority thread being delayed.

The problem here, is not that you overshoot by the de-schedule time which would be normal but by a full loop of the low level timer.

@MichelRottleuthner
Copy link
Contributor

I said I was writing it and provided a first skeleton, but it was merged before I could provide a test.

I was more referring to the fact that stuff like that should have been tested way earlier - preferably even before merging xtimer...

Yes but the documentation also says "this prerequisite is verified" when it is wrong.

Right, this comment should be removed.

In the same time, the requirements/behavior changed a little bit, as this line in the changeset, which did >not solve the main bug, would make more easily target be in the past.

        /* backoff */
-        xtimer_spin_until(target + XTIMER_BACKOFF);
+        xtimer_spin_until(target);
        _shoot(timer);
        return 0;

This shows one of the general problems quite well: no static offset value will be correct as long as the time it tries to compensate for is non-deterministic.

And what is performance when you compare something that works with one that does not ?

👍

Spin is an until and not an offset, so it can be elapsed.

Sure, what I meant is: "past" is hard to define if your time, like in the current spin implementation, is running in a circle ;)

It would be the caller that should disable interrupts in that case as you can also always be de-scheduled before you call the function.

👍

With masked interrupts you make higher priority thread be late by XTIMER_BACKOFF so not sure its better than the low priority thread being delayed.

I guess we can agree that delaying a higher prio thread is far from optimal and should be avoided - but your point on performance vs. actually working should be also considered here.

A general thing that always comes to my mind regarding xtimer: currently it is really hard to quantify and compare performance of the implementation and show effects of changes as comparable numbers.
What do you think about micro-benchmarks for this and which dimensions (like jitter, delay, resolution, precision, cpu and ram overhead, etc.) would you consider to be interesting for that?

@cladmi cladmi added this to the Release 2019.01 milestone Nov 7, 2018
@Josar
Copy link
Contributor

Josar commented Nov 8, 2018

@cladmi the merged xtimer PR was just a part which could be broken out from #8990. This handles also hopefully some of this problems.

Maybe we could discuss this further on and hopefully break out some of the changes into other PRs.

@cladmi
Copy link
Contributor Author

cladmi commented Nov 13, 2018

Question for the timer_set and timer_set_absolute, the API is not clear about it, what is supposed to happen if I do timer_set_absolute(timer_read() -1) ?

Does it trigger immediately as the value is elapsed, or does it ignore 0xFFFFFFFF and triggers on the next round ? does it say error ?

I need to know to implement the mock properly.

@jnohlgard
Copy link
Member

jnohlgard commented Nov 13, 2018

@cladmi the target should end up (timer max value) into the future

@Josar
Copy link
Contributor

Josar commented Nov 13, 2018

I think as it is implemented timer_set_absolute(timer_read() -1) will execute in the next round.
Same for timer_set().

There is no unsigned int which can not be set as target. And the hardware timer has no context of how the value was calculated. Only thing which can be done is set the requested value as a target in the future for the hardware counter.

Context is only available within xtimer.

@jnohlgard
Copy link
Member

jnohlgard commented Nov 13, 2018

The ztimer mock module (WIP) might be interesting here.
https://github.com/gebart/RIOT/blob/ztimer/sys/ztimer/mock.c

@cladmi
Copy link
Contributor Author

cladmi commented Nov 13, 2018

So we should be really really extra careful to not put a timer in the past as it would basically also ignore one timer overflow:

if (!timer_list_head) {
DEBUG("_timer_callback(): tick\n");
/* there's no timer for this timer period,
* so this was a timer overflow callback.
*
* In this case, we advance to the next timer period.
*/
_next_period();
reference = 0;

I think it deserves a test to show this in one of periph_timer test, I will see what I can do when working on it.

I think in my mock I would more put it more as an "assert it should not be in the past" as it is something that should not happen within xtimer.

@gebart I will take a look at your mock, it took me some time to understand that ztimer_mock_op_set was a mock of periph_timer_set :D

@kaspar030
Copy link
Contributor

@gebart I will take a look at your mock, it took me some time to understand that ztimer_mock_op_set was a mock of periph_timer_set :D

There's also a backend (hack) for xtimer so it can be used on top of any ztimer, e.g., the ztimer_mock.

@cladmi cladmi removed this from the Release 2019.04 milestone Apr 23, 2019
@cladmi cladmi added this to the Release 2019.07 milestone Apr 23, 2019
@tcschmidt
Copy link
Member

@MichelRottleuthner any next steps here?

@kb2ma kb2ma removed this from the Release 2019.10 milestone Oct 8, 2019
@miri64
Copy link
Member

miri64 commented Jun 24, 2020

@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@Teufelchen1 Teufelchen1 marked this pull request as draft March 13, 2024 12:00
@Teufelchen1
Copy link
Contributor

Converted to draft. There has been no activity for a long time but it does provide a POC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: tests Area: tests and testing framework Area: timers Area: timer subsystems CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.