WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

cladmi · 2018-11-02T19:04:18Z

Contribution description

This is more an issue with code than a pull request

The newly added documentation by #9211 explains the requirement of _xtimer_set_absolute, which is great as they are now explicit.

But the requirement are impossible to guarantee in the current implementation.

I provide unittests using a mock highly wip implementation of periph_timer to show cases where the prerequisites and stated conditions are not met.

Idea for a solution

Ideas I see to handle this

Get the now from the caller for _xtimer_set_absolute and change it to take now + offset.
Got into irq_disable before calling _xtimer_now in the function
Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements
There is a lot to say on this one and it would require specific definition of how long a process is allowed to be de-scheduled. Right now, if a process is de-scheduled for more than (1 << XTIMER_WITDH) / XTIMER_HZ it would not be detected by this function. This is 65ms on the iotlab-m3 board, and it may even require stricter constraints.

De-scheduled in _xtimer_set

    /* Ensure that offset is bigger than 'XTIMER_BACKOFF',
     * 'target - now' will always be the offset no matter if target < or > now.
     *
     * This expects that target was not set too close to now and overrun now, so
     * from setting target up until the call of '_xtimer_now()' above now has not
     * become equal or bigger than target.
     * This is crucial when using low CPU frequencies so reaching the '_xtimer_now()'
     * call needs multiple xtimer ticks.
     *
     * '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already
     * backing off for small values. */

xtimer_core.c

As _xtimer_set is executed without masked interrupt, the thread could be de-scheduled for XTIMER_BACKOFF +1 ticks between these two lines:

        uint32_t target = _xtimer_now() + offset;
        _xtimer_set_absolute(timer, target);

RIOT/sys/xtimer/xtimer_core.c

Lines 152 to 153 in 1d95da0

    
           uint32_t target = _xtimer_now() + offset; 
        
           _xtimer_set_absolute(timer, target);

Which could make _xtimer_set_absolute called with a time in the past.

So no _xtimer_set does not "ensure this".

Test provided for this one.

De-scheduled in _xtimer_set_absolute just after `_xtimer_now`

RIOT/sys/xtimer/xtimer_core.c

Lines 180 to 203 in 1d95da0

    
               uint32_t now = _xtimer_now(); 
        
               int res = 0; 
        
               timer->next = NULL; 
        
               /* Ensure that offset is bigger than 'XTIMER_BACKOFF', 
        
                * 'target - now' will allways be the offset no matter if target < or > now. 
        
                * 
        
                * This expects that target was not set too close to now and overrun now, so 
        
                * from setting target up until the call of '_xtimer_now()' above now has not 
        
                * become equal or bigger than target. 
        
                * This is crucial when using low CPU frequencies so reaching the '_xtimer_now()' 
        
                * call needs multiple xtimer ticks. 
        
                * 
        
                * '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already 
        
                * backing off for small values. */ 
        
               uint32_t offset = (target - now); 
        
               DEBUG("timer_set_absolute(): now=%" PRIu32 " target=%" PRIu32 " offset=%" PRIu32 "\n", 
        
                     now, target, offset); 
        
               if (offset <= XTIMER_BACKOFF) { 
        
                   /* backoff */ 
        
                   xtimer_spin_until(target);

    uint32_t now = _xtimer_now();
    // ...
    uint32_t offset = (target - now);
    // ...
    if (offset <= XTIMER_BACKOFF) {
        /* backoff */
        xtimer_spin_until(target);

The thread could be de-scheduled for XTIMER_BACKOFF +1 before calling xtimer_spin_until which would make it called with a time already elapsed, and so do a full loop around.

Test provided for this one.

De-scheduled in _xtimer_set_absolute before `irq_disable`

    /* Ensure timer is fired in right timer period.
     * Backoff condition above ensures that 'target - XTIMER_OVERHEAD` is later
     * than 'now', also for values when now will overflow and the value of target
     * is smaller then now.
     * If `target < XTIMER_OVERHEAD` the new target will be at the end of this
     * 32bit period, as `target - XTIMER_OVERHEAD` is a big number instead of a
     * small at the beginning of the next period. */

RIOT/sys/xtimer/xtimer_core.c

Lines 216 to 222 in 1d95da0

    
               /* Ensure timer is fired in right timer period. 
        
                * Backoff condition above ensures that 'target - XTIMER_OVERHEAD` is later 
        
                * than 'now', also for values when now will overflow and the value of target 
        
                * is smaller then now. 
        
                * If `target < XTIMER_OVERHEAD` the new target will be at the end of this 
        
                * 32bit period, as `target - XTIMER_OVERHEAD` is a big number instead of a 
        
                * small at the beginning of the next period. */

This could be show by de-scheduling before irq_disable, but the test does not shows this for the moment.

Testing procedure

Run the example in tests/xtimer_mocked_periph_timer with a board using a 1Mhz timer for xtimer (test implementation limitation) I tested it with samr21-xpro or iotlab-m3.

A test is provided for the first two cases.

The implementation uses a WIP mocked periph_timer implementation to allow simulating descheduling at certain times by increasing the low level timer.
It is based on the number of call to _xtimer_now, so it breaks with DEBUG in xtimer.

Issues/PRs references

Mention of the real issues in this comment https://github.com/RIOT-OS/RIOT/pull/9211/files#r229728883

…plementation

cladmi · 2018-11-02T19:09:49Z

@Josar I managed to show some issues by using mocks. As I started after your PR was merged, I focused on the remarks I did in the comments section of your PR. That is using your new documentation that I found all these issues so +1 for writing it :)
I hoped to have time to show and understand what you solved but it was already merged.

MichelRottleuthner · 2018-11-05T13:03:59Z

Nice! Tests for stuff like that should have been there before. Thanks for providing this mock test.
In reply to your comment in #9211:

I tried to prove my assumptions with #10321 please show me if my tests are wrong to say my critics are wrong.

I think most of the involved people actually agreed that there are more problems that aren't fixed by the provided PR. The intention for moving forward quickly with this wasn't to ignore/disagree with remaining critics. AIUI the requirements for the _xtimer_set_absolute parameters didn't change - but they are now stated explicitly. Nonetheless you are absolutely right about these requirements being hard (or in some cases impossible) to ensure - and we need to address that.

Just a few comments:

Get the now from the caller for _xtimer_set_absolute and change it to take now + offset.

Sounds reasonable to hand over this information.

Got into irq_disable before calling _xtimer_now in the function

I'm not sure on what other problems this may create and how this would change the performance.

Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements (...)

What is "past" from the context of the spin call? From the function I'd expect that it spins until this value is reached - without handling stuff like that. In general, spin should only be used for very short periods of time where timer setup and callback overhead is too heavy i.e. cases where another thread being scheduled in between will likely break what the spin is used for anyway (e.g. timing sensitive bitbanging). For that reason I think that spinning (if impossible to avoid) should only be used with interrupts disabled - because otherwise it probably won't work for the intended use case anyway.

jia200x · 2018-11-05T13:06:22Z

tests/xtimer_mocked_periph_timer/periph_timer_mock.h

+
+/*
+ * Set the internal timer without handling any callbacks
+ * Should fit the internal precisiong


I know it's still WIP, but there's a typo here

… mock implementation Change that the 'post' callback so it is done after 'timer_read' only.

…mock implementation

cladmi · 2018-11-05T15:47:44Z

Nice! Tests for stuff like that should have been there before. Thanks for providing this mock test.

I said I was writing it and provided a first skeleton, but it was merged before I could provide a test.

The principle is quite easy, you define macros to redirect function calls to the mocks and include the c file just after. So not adding USEMODULE. It allows mocking functions that are also used in the system and also access static functions and variables.

In reply to your comment in #9211:

I tried to prove my assumptions with #10321 please show me if my tests are wrong to say my critics are wrong.

I think most of the involved people actually agreed that there are more problems that aren't fixed by the provided PR. The intention for moving forward quickly with this wasn't to ignore/disagree with remaining critics. AIUI the requirements for the _xtimer_set_absolute parameters didn't change - but they are now stated explicitly. Nonetheless you are absolutely right about these requirements being hard (or in some cases impossible) to ensure - and we need to address that.

Yes but the documentation also says "this prerequisite is verified" when it is wrong.

In the same time, the requirements/behavior changed a little bit, as this line in the changeset, which did not solve the main bug, would make more easily target be in the past.

         /* backoff */
-        xtimer_spin_until(target + XTIMER_BACKOFF);
+        xtimer_spin_until(target);
         _shoot(timer);
         return 0;

https://github.com/RIOT-OS/RIOT/pull/9211/files#diff-5371c49e6244a42b30cd3a714c0e9aacR203

And all the changes, documentation, fix and unrelated were in a single commit.

Got into irq_disable before calling _xtimer_now in the function

I'm not sure on what other problems this may create and how this would change the performance.

I will try to show with tests too, but I will need to mock irq_disable for it I think and implement periph_timer callback.

The goal is to have a deterministic time between calling now and configuring the low level timer.
Because if you configure the low level timer too late then you are good for a full loop I think or even two as you do long_target++ and set with a target in the past for the low level timer, so in a full loop (I did not completely reviewed this yet so not sure).

The performance should only be that we spend the time taken to call xtimer_now with disabled interrupts.
And what is performance when you compare something that works with one that does not ?

They can be restored before doing xtimer_spin_until to keep the current behavior when spinning.

Make xtimer_spin_until resistant to being called in the past and correctly defines its requirements (...)

What is "past" from the context of the spin call? From the function I'd expect that it spins until this value is reached - without handling stuff like that. In general, spin should only be used for very short periods of time where timer setup and callback overhead is too heavy i.e. cases where another thread being scheduled in between will likely break what the spin is used for anyway (e.g. timing sensitive bitbanging). For that reason I think that spinning (if impossible to avoid) should only be used with interrupts disabled - because otherwise it probably won't work for the intended use case anyway.

Spin is an until and not an offset, so it can be elapsed.

The test_show_xtimer_set_does_not_handle_overhead test shows the issue.
The base problem is with the xtimer_spin_until implementation, not with the concept.

As said there are many things with this one that are together and did not listed all of them yet.
But it can be called from a user thread or even one xtimer callback so in the timer interrupt.

I do not know how timing sensitive they are, but with two applications you could overshoot by the de-schedule time anyway if another application de-schedules you before you enter the function with offset, or even just after spinning so before you call _shoot. It would be the caller that should disable interrupts in that case as you can also always be de-scheduled before you call the function.

A de-scheduler thread could de-schedule you by XTIMER_BACKOFF by just calling xtimer_set with a low offset. With masked interrupts you make higher priority thread be late by XTIMER_BACKOFF so not sure its better than the low priority thread being delayed.

The problem here, is not that you overshoot by the de-schedule time which would be normal but by a full loop of the low level timer.

MichelRottleuthner · 2018-11-06T09:49:43Z

I said I was writing it and provided a first skeleton, but it was merged before I could provide a test.

I was more referring to the fact that stuff like that should have been tested way earlier - preferably even before merging xtimer...

Yes but the documentation also says "this prerequisite is verified" when it is wrong.

Right, this comment should be removed.

In the same time, the requirements/behavior changed a little bit, as this line in the changeset, which did >not solve the main bug, would make more easily target be in the past.
        /* backoff */
-        xtimer_spin_until(target + XTIMER_BACKOFF);
+        xtimer_spin_until(target);
        _shoot(timer);
        return 0;

This shows one of the general problems quite well: no static offset value will be correct as long as the time it tries to compensate for is non-deterministic.

And what is performance when you compare something that works with one that does not ?

👍

Spin is an until and not an offset, so it can be elapsed.

Sure, what I meant is: "past" is hard to define if your time, like in the current spin implementation, is running in a circle ;)

It would be the caller that should disable interrupts in that case as you can also always be de-scheduled before you call the function.

👍

With masked interrupts you make higher priority thread be late by XTIMER_BACKOFF so not sure its better than the low priority thread being delayed.

I guess we can agree that delaying a higher prio thread is far from optimal and should be avoided - but your point on performance vs. actually working should be also considered here.

A general thing that always comes to my mind regarding xtimer: currently it is really hard to quantify and compare performance of the implementation and show effects of changes as comparable numbers.
What do you think about micro-benchmarks for this and which dimensions (like jitter, delay, resolution, precision, cpu and ram overhead, etc.) would you consider to be interesting for that?

Josar · 2018-11-08T12:20:07Z

@cladmi the merged xtimer PR was just a part which could be broken out from #8990. This handles also hopefully some of this problems.

Maybe we could discuss this further on and hopefully break out some of the changes into other PRs.

cladmi · 2018-11-13T14:37:39Z

Question for the timer_set and timer_set_absolute, the API is not clear about it, what is supposed to happen if I do timer_set_absolute(timer_read() -1) ?

Does it trigger immediately as the value is elapsed, or does it ignore 0xFFFFFFFF and triggers on the next round ? does it say error ?

I need to know to implement the mock properly.

jnohlgard · 2018-11-13T15:57:31Z

@cladmi the target should end up (timer max value) into the future

Josar · 2018-11-13T15:58:06Z

I think as it is implemented timer_set_absolute(timer_read() -1) will execute in the next round.
Same for timer_set().

There is no unsigned int which can not be set as target. And the hardware timer has no context of how the value was calculated. Only thing which can be done is set the requested value as a target in the future for the hardware counter.

Context is only available within xtimer.

jnohlgard · 2018-11-13T16:19:33Z

The ztimer mock module (WIP) might be interesting here.
https://github.com/gebart/RIOT/blob/ztimer/sys/ztimer/mock.c

cladmi · 2018-11-13T17:32:37Z

So we should be really really extra careful to not put a timer in the past as it would basically also ignore one timer overflow:

RIOT/sys/xtimer/xtimer_core.c

Lines 477 to 486 in 8de5ba2

    
               if (!timer_list_head) { 
        
                   DEBUG("_timer_callback(): tick\n"); 
        
                   /* there's no timer for this timer period, 
        
                    * so this was a timer overflow callback. 
        
                    * 
        
                    * In this case, we advance to the next timer period. 
        
                    */ 
        
                   _next_period(); 
        
                   reference = 0;

I think it deserves a test to show this in one of periph_timer test, I will see what I can do when working on it.

I think in my mock I would more put it more as an "assert it should not be in the past" as it is something that should not happen within xtimer.

@gebart I will take a look at your mock, it took me some time to understand that ztimer_mock_op_set was a mock of periph_timer_set :D

kaspar030 · 2018-11-13T21:29:24Z

@gebart I will take a look at your mock, it took me some time to understand that ztimer_mock_op_set was a mock of periph_timer_set :D

There's also a backend (hack) for xtimer so it can be used on top of any ztimer, e.g., the ztimer_mock.

tcschmidt · 2019-08-16T23:44:25Z

@MichelRottleuthner any next steps here?

miri64 · 2020-06-24T13:55:52Z

Ping @MichelRottleuthner @JulianHolzwarth?

Teufelchen1 · 2024-03-13T12:00:36Z

Converted to draft. There has been no activity for a long time but it does provide a POC.

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock im…

083c5f7

…plementation

cladmi requested review from jnohlgard, MichelRottleuthner, MrKevinWeiss, leandrolanzieri and jcarrano November 2, 2018 19:05

cladmi mentioned this pull request Nov 3, 2018

xtimer: timer & target overflow, hang resolved. #9211

Merged

jia200x reviewed Nov 5, 2018

View reviewed changes

cladmi added 2 commits November 5, 2018 15:22

squash! WIP/POC: start a test where xtimer uses a fake 'periph_timer'…

146417a

… mock implementation Change that the 'post' callback so it is done after 'timer_read' only.

fixup! WIP/POC: start a test where xtimer uses a fake 'periph_timer' …

9508fd2

…mock implementation

cladmi added this to the Release 2019.01 milestone Nov 7, 2018

aabadie modified the milestones: Release 2019.01, Release 2019.04 Jan 25, 2019

cladmi removed this from the Release 2019.04 milestone Apr 23, 2019

cladmi added this to the Release 2019.07 milestone Apr 23, 2019

cladmi modified the milestones: Release 2019.07, Release 2019.10 Jun 14, 2019

kb2ma removed this from the Release 2019.10 milestone Oct 8, 2019

MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

Teufelchen1 marked this pull request as draft March 13, 2024 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

cladmi commented Nov 2, 2018 •

edited

Loading

cladmi commented Nov 2, 2018 •

edited

Loading

MichelRottleuthner commented Nov 5, 2018

jia200x Nov 5, 2018

cladmi Nov 5, 2018

cladmi commented Nov 5, 2018 •

edited

Loading

MichelRottleuthner commented Nov 6, 2018

Josar commented Nov 8, 2018

cladmi commented Nov 13, 2018

jnohlgard commented Nov 13, 2018 •

edited

Loading

Josar commented Nov 13, 2018 •

edited

Loading

jnohlgard commented Nov 13, 2018 •

edited

Loading

cladmi commented Nov 13, 2018

kaspar030 commented Nov 13, 2018

tcschmidt commented Aug 16, 2019

miri64 commented Jun 24, 2020

Teufelchen1 commented Mar 13, 2024

	uint32_t target = _xtimer_now() + offset;
	_xtimer_set_absolute(timer, target);

	uint32_t now = _xtimer_now();
	int res = 0;

	timer->next = NULL;

	/* Ensure that offset is bigger than 'XTIMER_BACKOFF',
	* 'target - now' will allways be the offset no matter if target < or > now.
	*
	* This expects that target was not set too close to now and overrun now, so
	* from setting target up until the call of '_xtimer_now()' above now has not
	* become equal or bigger than target.
	* This is crucial when using low CPU frequencies so reaching the '_xtimer_now()'
	* call needs multiple xtimer ticks.
	*
	* '_xtimer_set()' and `_xtimer_periodic_wakeup()` ensure this by already
	* backing off for small values. */
	uint32_t offset = (target - now);

	DEBUG("timer_set_absolute(): now=%" PRIu32 " target=%" PRIu32 " offset=%" PRIu32 "\n",
	now, target, offset);

	if (offset <= XTIMER_BACKOFF) {
	/* backoff */
	xtimer_spin_until(target);

	/* Ensure timer is fired in right timer period.
	* Backoff condition above ensures that 'target - XTIMER_OVERHEAD` is later
	* than 'now', also for values when now will overflow and the value of target
	* is smaller then now.
	* If `target < XTIMER_OVERHEAD` the new target will be at the end of this
	* 32bit period, as `target - XTIMER_OVERHEAD` is a big number instead of a
	* small at the beginning of the next period. */

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

Are you sure you want to change the base?

WIP/POC: start a test where xtimer uses a fake 'periph_timer' mock implementation #10321

Conversation

cladmi commented Nov 2, 2018 • edited Loading

Contribution description

Idea for a solution

De-scheduled in _xtimer_set

De-scheduled in _xtimer_set_absolute just after _xtimer_now

De-scheduled in _xtimer_set_absolute before irq_disable

Testing procedure

Issues/PRs references

cladmi commented Nov 2, 2018 • edited Loading

MichelRottleuthner commented Nov 5, 2018

jia200x Nov 5, 2018

Choose a reason for hiding this comment

cladmi Nov 5, 2018

Choose a reason for hiding this comment

cladmi commented Nov 5, 2018 • edited Loading

MichelRottleuthner commented Nov 6, 2018

Josar commented Nov 8, 2018

cladmi commented Nov 13, 2018

jnohlgard commented Nov 13, 2018 • edited Loading

Josar commented Nov 13, 2018 • edited Loading

jnohlgard commented Nov 13, 2018 • edited Loading

cladmi commented Nov 13, 2018

kaspar030 commented Nov 13, 2018

tcschmidt commented Aug 16, 2019

miri64 commented Jun 24, 2020

Teufelchen1 commented Mar 13, 2024

cladmi commented Nov 2, 2018 •

edited

Loading

De-scheduled in _xtimer_set_absolute just after `_xtimer_now`

De-scheduled in _xtimer_set_absolute before `irq_disable`

cladmi commented Nov 2, 2018 •

edited

Loading

cladmi commented Nov 5, 2018 •

edited

Loading

jnohlgard commented Nov 13, 2018 •

edited

Loading

Josar commented Nov 13, 2018 •

edited

Loading

jnohlgard commented Nov 13, 2018 •

edited

Loading