Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/ztimer_usleep: add sleep test for ztimer #14931

Closed

Conversation

haukepetersen
Copy link
Contributor

Contribution description

When trying to get #14897 running on a slower board (arify-beacon -> nrf51 @ 16MHz) I stumbled upon the behavior already described in #10073: xtimer experiences a constant offset when setting it, in my case 81µs +- 2µs. I obtained this value using logic analyzer with the driver code from #14897, but I get the same value running tests/xtimer_usleep. My hope was to mitigate this problem by simply using ztimer, but here I run into the same problem...

To raise awareness for this, I think it is a good idea to apply the xtimer_usleep PR to ztimer. So this PR more or less copies the existing xtimer test and simply uses ztimer instead. To also get a feeling on the overhead in conjunction with spinning, I added some shorter sleep values as well.

From my two quick tests I got the following offsets:
airfy-beacon (Cortex-M0, 16MHz):

  • xtimer: 81-82µs
  • ztimer: 100-101µs
    nrf52840dk (Cortex-M4f, 64MHz):
  • xtimer: 31-32µs
  • ztimer: 27-28µs

IMO these would be ok for very coarse timings, but for any timeout > 100µs there should not be such kind of imprecision, right?!

Or did I miss something in terms of configuration options etc, that would allow to compensate for this offset?

Testing procedure

Run with any boards at your disposal, you should see the same output as running tests/xtimer_usleep - the same except the offset values that should be slightly different for ztimer.

Issues/PRs references

show cases that #10073 also holds for ztimer

@kaspar030
Copy link
Contributor

IMO these would be ok for very coarse timings, but for any timeout > 100µs there should not be such kind of imprecision, right?!

Or did I miss something in terms of configuration options etc, that would allow to compensate for this offset?

I can reproduce these numbers for nrf52840dk. Don't have the airfy beacon to cross-check.

There's indeed a knob that's not configured: each ztimer clock has a value that's subtracted from every timer that's set. tests/ztimer_overhead can be run to figure out that value. It is board specific.
On nrf52840dk I need to set it to 7 to make tests/ztimer_overhead show a diff of 0.

CFLAGS += -DCONFIG_ZTIMER_USEC_ADJUST=7

Setting this reduces the xtimer_usleep error by 7, as expected.
This makes a "ztimer_now(); ztimer_set(), (in callback) ztimer_now()" have zero overhead.

Still there are 20-23us left in the sleep test. (We cannot set that adjust value higher, as it would then lead to timers triggering earlier than set).

I then tried to measure the time it takes from the ztimer isr to right after waking up.
I hacked this by adding a "__unlock_time = ztimer_now(ZTIMER_USEC)" as first line in the isr, and print the difference to ztimer_now() right after the sleep. There are ten full microseconds there (640 clock cycles at 64MHz), which seems like a lot. Even weirder is that the other ten microseconds must be in the first few lines of ztimer_sleep().

I tried figuring out if the NRF's flash cache has any impact. Enabling its profiling shows 100 cache misses from before to after the sleep(). Disabling the i-cache completely shows 212 cache misses, and increases the overhead by ~1us. So I'd assume we're not a victim of flash cache misses.

So for now:

  • the numbers (30us / 100us) are surprisingly high
  • the situation can be improved slightly by setting CONFIG_ZTIMER_USEC_ADJUST
  • there might be a need for a similar value for x/ztimer_sleep()
  • I feel like I'm missing something

@haukepetersen
Copy link
Contributor Author

Awesome, thanks for the analysis!

The CONFIG_ZTIMER_USEC_ADJUST value I missed, and that was the kind of config option I was looking for. Following some results with and without that option for the airfy-beacon:

// output of tests/ztimer_overhead, CONFIG_ZTIMER_USEC_ADJUST=0:
2020-09-03 08:29:21,056 # main(): This is RIOT! (Version: 2020.10-devel-1217-g901f3-add_test_ztimerusleep)
2020-09-03 08:29:21,058 # ZTIMER_USEC->adjust = 0
2020-09-03 08:29:22,173 # min=39 max=39 avg_diff=39

// output of tests/ztimer_overhead, CONFIG_ZTIMER_USEC_ADJUST=39:
2020-09-03 08:36:02,737 # main(): This is RIOT! (Version: 2020.10-devel-1217-g901f3-add_test_ztimerusleep)
2020-09-03 08:36:02,739 # ZTIMER_USEC->adjust = 39
2020-09-03 08:36:03,814 # min=0 max=0 avg_diff=0

// last run of tests/ztimer_usleep, CONFIG_ZTIMER_USEC_ADJUST=0:
2020-09-03 08:37:12,047 # Slept for 140 us (expected: 40 us) Offset: 100 us
2020-09-03 08:37:12,052 # Slept for 124 us (expected: 20 us) Offset: 104 us
2020-09-03 08:37:12,057 # Slept for 114 us (expected: 1 us) Offset: 113 us
2020-09-03 08:37:12,072 # Slept for 10100 us (expected: 10000 us) Offset: 100 us
2020-09-03 08:37:12,128 # Slept for 50100 us (expected: 50000 us) Offset: 100 us
2020-09-03 08:37:12,143 # Slept for 10335 us (expected: 10234 us) Offset: 101 us
2020-09-03 08:37:12,205 # Slept for 56881 us (expected: 56780 us) Offset: 101 us
2020-09-03 08:37:12,222 # Slept for 12223 us (expected: 12122 us) Offset: 101 us
2020-09-03 08:37:12,327 # Slept for 98866 us (expected: 98765 us) Offset: 101 us
2020-09-03 08:37:12,407 # Slept for 75101 us (expected: 75000 us) Offset: 101 us
2020-09-03 08:37:13,413 # Slept for 1000101 us (expected: 1000000 us) Offset: 101 us

// last run of tests/ztimer_usleep, CONFIG_ZTIMER_USEC_ADJUST=39
2020-09-03 08:38:30,964 # Slept for 114 us (expected: 40 us) Offset: 74 us
2020-09-03 08:38:30,969 # Slept for 114 us (expected: 20 us) Offset: 94 us
2020-09-03 08:38:30,974 # Slept for 113 us (expected: 1 us) Offset: 112 us
2020-09-03 08:38:30,989 # Slept for 10061 us (expected: 10000 us) Offset: 61 us
2020-09-03 08:38:31,043 # Slept for 50061 us (expected: 50000 us) Offset: 61 us
2020-09-03 08:38:31,059 # Slept for 10296 us (expected: 10234 us) Offset: 62 us
2020-09-03 08:38:31,121 # Slept for 56841 us (expected: 56780 us) Offset: 61 us
2020-09-03 08:38:31,138 # Slept for 12184 us (expected: 12122 us) Offset: 62 us
2020-09-03 08:38:31,242 # Slept for 98826 us (expected: 98765 us) Offset: 61 us
2020-09-03 08:38:31,323 # Slept for 75061 us (expected: 75000 us) Offset: 61 us

Lastly for comparison the last run of tests/xtimer_usleep:

2020-09-03 08:39:39,692 # Slept for 121 us (expected: 40 us) Offset: 81 us
2020-09-03 08:39:39,701 # Slept for 48 us (expected: 20 us) Offset: 28 us
2020-09-03 08:39:39,702 # Slept for 30 us (expected: 1 us) Offset: 29 us
2020-09-03 08:39:39,716 # Slept for 10081 us (expected: 10000 us) Offset: 81 us
2020-09-03 08:39:39,771 # Slept for 50081 us (expected: 50000 us) Offset: 81 us
2020-09-03 08:39:39,787 # Slept for 10316 us (expected: 10234 us) Offset: 82 us
2020-09-03 08:39:39,849 # Slept for 56862 us (expected: 56780 us) Offset: 82 us
2020-09-03 08:39:39,866 # Slept for 12203 us (expected: 12122 us) Offset: 81 us
2020-09-03 08:39:39,970 # Slept for 98847 us (expected: 98765 us) Offset: 82 us
2020-09-03 08:39:40,050 # Slept for 75081 us (expected: 75000 us) Offset: 81 us
2020-09-03 08:39:41,055 # Slept for 1000082 us (expected: 1000000 us) Offset: 82 us

So same conclusions here:

  • CONFIG_ZTIMER_USEC_ADJUST does help
  • but the overhead for ztimer_sleep(), even with offset adjustment, is IMHO too large -> .e.g. 1.2% off for a 5ms sleep period...
  • the task switching overhead seems to take too long - if I remember correctly we had numbers around ~350cycles for ISR-to-task in measurements some years ago, right?!
  • having an additional adjust value for the timer might indeed help the cause

One more question: does ztimer have a backoff value? Having this minimum delay of ~110+some us seems rather large to me.

@haukepetersen
Copy link
Contributor Author

To get a better overview, here are the same logs for the samr21-xpro:

// output of tests/ztimer_overhead, CONFIG_ZTIMER_USEC_ADJUST=0:
2020-09-03 08:54:23,760 # main(): This is RIOT! (Version: 2020.10-devel-1217-g901f3-add_test_ztimerusleep)
2020-09-03 08:54:23,767 # ZTIMER_USEC->adjust = 0
2020-09-03 08:54:24,871 # min=39 max=39 avg_diff=39

// output of tests/ztimer_overhead, CONFIG_ZTIMER_USEC_ADJUST=39:
2020-09-03 08:54:51,685 # main(): This is RIOT! (Version: 2020.10-devel-1217-g901f3-add_test_ztimerusleep)
2020-09-03 08:54:51,687 # ZTIMER_USEC->adjust = 39
2020-09-03 08:54:52,756 # min=0 max=0 avg_diff=0

// last run of tests/ztimer_usleep, CONFIG_ZTIMER_USEC_ADJUST=0:
2020-09-03 08:55:19,494 # Slept for 123 us (expected: 40 us) Offset: 83 us
2020-09-03 08:55:19,499 # Slept for 103 us (expected: 20 us) Offset: 83 us
2020-09-03 08:55:19,503 # Slept for 88 us (expected: 1 us) Offset: 87 us
2020-09-03 08:55:19,518 # Slept for 10083 us (expected: 10000 us) Offset: 83 us
2020-09-03 08:55:19,573 # Slept for 50081 us (expected: 50000 us) Offset: 81 us
2020-09-03 08:55:19,589 # Slept for 10317 us (expected: 10234 us) Offset: 83 us
2020-09-03 08:55:19,651 # Slept for 56861 us (expected: 56780 us) Offset: 81 us
2020-09-03 08:55:19,668 # Slept for 12205 us (expected: 12122 us) Offset: 83 us
2020-09-03 08:55:19,772 # Slept for 98846 us (expected: 98765 us) Offset: 81 us
2020-09-03 08:55:19,853 # Slept for 75081 us (expected: 75000 us) Offset: 81 us
2020-09-03 08:55:20,864 # Slept for 1000081 us (expected: 1000000 us) Offset: 81 us
2020-09-03 08:55:20,866 # Test ran for 6830613 u

// last run of tests/ztimer_usleep, CONFIG_ZTIMER_USEC_ADJUST=39
2020-09-03 08:55:49,741 # Slept for 88 us (expected: 40 us) Offset: 48 us
2020-09-03 08:55:49,746 # Slept for 88 us (expected: 20 us) Offset: 68 us
2020-09-03 08:55:49,750 # Slept for 88 us (expected: 1 us) Offset: 87 us
2020-09-03 08:55:49,765 # Slept for 10044 us (expected: 10000 us) Offset: 44 us
2020-09-03 08:55:49,820 # Slept for 50042 us (expected: 50000 us) Offset: 42 us
2020-09-03 08:55:49,835 # Slept for 10278 us (expected: 10234 us) Offset: 44 us
2020-09-03 08:55:49,898 # Slept for 56822 us (expected: 56780 us) Offset: 42 us
2020-09-03 08:55:49,915 # Slept for 12166 us (expected: 12122 us) Offset: 44 us
2020-09-03 08:55:50,019 # Slept for 98807 us (expected: 98765 us) Offset: 42 us
2020-09-03 08:55:50,099 # Slept for 75042 us (expected: 75000 us) Offset: 42 us
2020-09-03 08:55:51,111 # Slept for 1000042 us (expected: 1000000 us) Offset: 42 us

// tests/xtimer_usleep:
2020-09-03 08:52:17,442 # Slept for 112 us (expected: 40 us) Offset: 72 us
2020-09-03 08:52:17,446 # Slept for 45 us (expected: 20 us) Offset: 25 us
2020-09-03 08:52:17,450 # Slept for 31 us (expected: 1 us) Offset: 30 us
2020-09-03 08:52:17,466 # Slept for 10072 us (expected: 10000 us) Offset: 72 us
2020-09-03 08:52:17,521 # Slept for 50072 us (expected: 50000 us) Offset: 72 us
2020-09-03 08:52:17,536 # Slept for 10306 us (expected: 10234 us) Offset: 72 us
2020-09-03 08:52:17,598 # Slept for 56852 us (expected: 56780 us) Offset: 72 us
2020-09-03 08:52:17,615 # Slept for 12194 us (expected: 12122 us) Offset: 72 us
2020-09-03 08:52:17,720 # Slept for 98837 us (expected: 98765 us) Offset: 72 us
2020-09-03 08:52:17,800 # Slept for 75072 us (expected: 75000 us) Offset: 72 us
2020-09-03 08:52:18,812 # Slept for 1000072 us (expected: 1000000 us) Offset: 72 us

Looking at these numbers, it does not seem this is a nrf5x specific issue...

@kaspar030
Copy link
Contributor

Here's a way to adjust for the offset: #14936

@bergzand
Copy link
Member

bergzand commented Sep 3, 2020

By request from @kaspar030 I've hooked up my logic analyzer to the nrf52840dk and measured some functions using dbgpin. See bergzand@6fe5eac for the spots where I added the dbgpin calls. The logic analyzer is sampling at 16 MHz here:
Screenshot_20200903_121755

And the raw data as "srzip session file format data" and zipped so that github doesn't complain (supported by pulseview):
ztimer_usleep.zip

@bergzand
Copy link
Member

bergzand commented Sep 3, 2020

On the cortex-m0+ this picture is going to look a bit different. First it doesn't support the no_idle_thread feature, so instead of one long sched_run() hiding the pm_set_lowest(), it will instead show two short sched_run() bumps. One to switch to the idle thread and one to switch back from it. Support for the cortex-m0+ MPU is also not available shaving off a few nanoseconds as it doesn't have to be reconfigured to guard the top of the stack.

@stale
Copy link

stale bot commented Mar 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Mar 19, 2021
@stale stale bot closed this Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: tests Area: tests and testing framework State: stale State: The issue / PR has no activity for >185 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants