Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New timeout API #17155

Closed
wants to merge 19 commits into from
Closed

Conversation

andyross
Copy link
Contributor

This is a proof of concept (right now it builds and runs tests/kernel/common on native_posix only) of a new timeout API that allows us to get beyond the millisecond world. As of #16782, it's possible to push the kernel timing resolution essentially arbitrarily close to the hardware capabilities. But we still have two problems:

  1. The only API that lets an app actually achieve that precision is k_usleep(). All the other timeout parameters in the kernel (e.g. the delay for k_thread_create(), timeouts for k_sem_get() and other IPC primitives, etc...) are still specified in milliseconds.

  2. With a 32 bit timeout, we can't go much farther than 32 kHz anyway. At that frequency, the clock will roll over in ~2.5 days, which is dangerously close to values that real world applications are likely to see.

So what this series does is change all those timeout parameters into an opaque "k_timeout_t" type. This is still configurable to be a 32 bit millisecond count for compatibility (obviously at the expense of new features) so we don't break any promises for a version bump. The new API allows setting that timeout in any of "ticks", "cycles", "milliseconds" or "microseconds" ("nanoseconds" would be easily doable in the same framework, but frankly we don't have hardware that can exercise that yet) as the user desires, with appropriate rounding happening internally so the user doesn't need to muck with it.

Along the way, it introduces a whole new time unit conversion API, with 48 (!!) optimized conversion functions generated automatically for all (useful) combinations of unit, rounding and precision.

Then finally it shows off the new framework by adding a whole new timeout class: "uptime" timeouts, where instead of telling the kernel how long you want to wait, you tell it the moment you want to wake up (in your choice of units). The kernel handles the delta computation for you atomically. And it works in like three lines of code.

Obviously this isn't done yet. In addition to the need for documentation of the new APIs, fleshing out of missing bits and the inevitable parade of undiscovered new bugs, as implemented this is currently incompatible with userspace. The new k_timeout_t argument may be a 64 bit value, and our existing syscall implementation only handles arguments which fit in machine registers. This isn't hard to fix the tedious way (by splitting off a separate set of system calls to back each one of the timeout-taking functions), but ideally we should be able to make this work in the syscall generator. Either way, it's a decent chunk of work and none of it has been done.

@andyross andyross added the DNM This PR should not be merged (Do Not Merge) label Jun 28, 2019
@andyross andyross requested review from pabigot and pizi-nordic June 28, 2019 21:10
@zephyrbot zephyrbot added area: Test Framework Issues related not to a particular test, but to the framework instead area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Kernel labels Jun 28, 2019
@zephyrbot
Copy link
Collaborator

zephyrbot commented Jun 28, 2019

Found the following issues, please fix and resubmit:

License issues

In most cases you do not need to do anything here, especially if the files
reported below are going into ext/ and if license was approved for inclusion
into ext/ already. Fix any missing license/copyright issues. The license
exception if a JFYI for the maintainers and can be overriden when merging the
pull request.

  • tests/kernel/timer/timer_api/src/timer_convert.c missing license.
  • tests/kernel/timer/timer_api/src/timer_convert.c missing copyright.

Documentation issues

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:67: warning: documentation for unknown define K_TIMEOUT_UPTIME_TICKS found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:76: warning: documentation for unknown define K_TIMEOUT_UPTIME_MS found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:85: warning: documentation for unknown define K_TIMEOUT_UPTIME_US found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/sys_clock.h:94: warning: documentation for unknown define K_TIMEOUT_UPTIME_CYC found.

/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2965: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_end_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2975: warning: The following parameters of k_delayed_work_end_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2984: warning: argument 'delayed_work' of command @param is not found in the argument list of k_delayed_work_remaining_ticks(struct k_delayed_work *dw)
/var/lib/shippable/build/IN/main_repo/zephyr/include/kernel.h:2995: warning: The following parameters of k_delayed_work_remaining_ticks(struct k_delayed_work *dw) are not documented:
  parameter 'dw'



@andyross
Copy link
Contributor Author

Just to note: it's not as big as it looks. About 700 lines are the auto-generated conversion utilities in time_units.h. And all the API churn is pretty much just boilerplate. Look at sys_clock.h and the first 10% of time_units.h for the meat of the PR.

Also note that almost nothing was changed (just the change in type of k_timeout_t and the few lines added for absolute timeouts) in the guts of the timeout implementation. This is all happening at the API level.

@andyross
Copy link
Contributor Author

Also also note the perl script I snuck into the comments. :)

@pabigot
Copy link
Collaborator

pabigot commented Jun 29, 2019

Thanks for putting this together. In short I think the technical approach of k_timeout_t has great potential, but the details are premature.

I had hoped we would have an opportunity to discuss the goals and expectations before jumping straight to an implementation that codifies specific expectations, even though it is clearly marked as a prototype. I've opened #17162 for this, containing material that I drafted before looking at this PR, but refined based on what you've done here. Some concerns that should be mooted there are:

  • We need to determine basic parameters like the precision of delay specification; microseconds is not enough in some cases.
  • We might consider an API like POSIX clock_gettime() both to standardize how clocks are read and to allow for expressing timeouts and scheduling in different domains. That could affect how many of the upper bits of the 64-bit encoding must be reserved for current and future metadata.
  • It's not clear whether this allows reconstruction of the absolute deadline of a timeout specified by a relative offset.

Some of this might not be achievable, and if it is some might be an independent change, but if we're going to touch the delay/timeout API we should reach consensus on what we'd like to see it support so we don't have to do it again for a while.

Copy link
Collaborator

@pizi-nordic pizi-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you expect to pass expected absolute expiration time back to caller?
(feature requested by @pabigot).

include/time_units.h Show resolved Hide resolved
return 0;
/* New API going forward: k_timeout_t is an opaque struct */
typedef struct { k_ticks_t ticks; } k_timeout_t;
#define K_TIMEOUT_TICKS(t) ((k_timeout_t){ (k_ticks_t)t })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why casting to k_ticks_t is needed here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pizi-nordic It's needed because t may be -1 which could produce an error/warning when used to initialize the unsigned field.

@andyross This needs to be (k_ticks_t)(t).

kernel/mempool.c Outdated
@@ -13,6 +13,14 @@
#include <sys/math_extras.h>
#include <stdbool.h>

#ifdef CONFIG_SYS_TIMEOUT_LEGACY_API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have global K_TICKS_FOREVER, K_TICKS_NO_WAIT instead of local definitions?

@carlescufi carlescufi requested a review from nordic-krch July 16, 2019 12:07
#else
__ASSERT(ticks == 0, "ticks not zero");
return 0;
// Uptime API: specify a timeout to happen at (i.e. immediately after,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit worried about adding new feature (absolute timings) working only with 64bit. It is performance and rom footprint penalty. I tried simple function (adding return u64+u32 vs. return u32 + u32) and u64 version took twice more (20 cycles vs 40 cycles).

I was then wondering if maybe it would be better to use term 'absolute' instead of uptime and define k_timeout_t as

typedef struct {
    k_ticks_t relative: 1;
    k_ticks_t ticks: K_TICKS_BITS - 1;
} k_timeout_t;

For devices with 32k system clock frequency 2^31 is 18 hours so that's more then enough (in most of the cases).

@carlescufi
Copy link
Member

Additional requirements as per kernel meeting:

  • Ability to retrieve the absolute timeout of a particular operation (k_timer, k_delayed_work, etc) via specialized functions (k_timer_get_timeout())
  • Ability to query the uptime in ticks: k_uptime_get_ticks()

@pabigot
Copy link
Collaborator

pabigot commented Jul 18, 2019

As one of the folks most interested in absolute timeouts: I don't consider that feature to be the highest priority. As long as I can read the full-resolution tick counter, and express delays in ticks, that'd meet my desires for 2.0. If absolute timeouts adds new API, I'd rather wait until we have a chance to think carefully about it in the context of other requirements in #17162.

@nordic-krch
Copy link
Contributor

nordic-krch commented Jul 19, 2019

I've tried to compile bluetooth/central_hr sample on nrf52840 with 64 bit and 32 bit. Had to do tweaks in multiple places to compile it but finally got following results:

32 bits 64bit diff
rom 124591 125231 650
ram 19448 19680 232

looks like it isn't that much.

@pabigot pabigot added this to the v2.0.0 milestone Jul 20, 2019
typedef struct { k_ticks_t ticks; } k_timeout_t;
#define K_TIMEOUT_TICKS(t) ((k_timeout_t){ (k_ticks_t)t })
#define K_TIMEOUT_GET(t) ((t).ticks)
#define K_TIMEOUT_EQ(a, b) ((a).ticks == (b).ticks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add macros for adding and substracting.
Consider following example:

timeout -= MIN(timeout, diff);

diff is given in milliseconds so following operation would look like:

timeout = K_TIMEOUT_TICKS(TIMEOUT_GET(timeout) - MIN(K_TIMEOUT_GET(timeout), K_TIMEOUT_GET(K_MSEC(diff))))

which looks a bit complicated.

There could be macros like K_TIMEOUT_SUB(timeout, raw_val) and K_TIMEOUT_ADD(timeout, raw_val) which could be used in that case. Additionally, I'm missing macros for converting time to k_ticks_t, there is only conversion to k_timeout_t. In that particular case K_TICKS_MSEC would come handy for something like:

MIN(K_TIMEOUT_GET(timeout), K_TICKS_MSEC(diff))

After adding those macros operation would look:

K_TIMEOUT_SUB(timeout, MIN(K_TIMEOUT_GET(timeout), K_TICKS_MSEC(diff)));

#else
__ASSERT(us == 0, "us not zero");
return 0;
#define K_TIMEOUT_MS(t) K_TIMEOUT_TICKS(k_ms_to_ticks_ceil32(t))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen when conversion result exceeds 32 bits? Is it truncated? shouldn't there be an assert? won't truncation be mixed up with K_FOREVER?

#else
__ASSERT(us == 0, "us not zero");
return 0;
#define K_TIMEOUT_MS(t) K_TIMEOUT_TICKS(k_ms_to_ticks_ceil32(t))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't K_TIMEOUT_MS(x) return x in legacy mode? since k_sleep(5) should behave as k_sleep(K_TIMEOUT_MS(5)). or not?

*
* @return Timeout delay value.
*/
#define K_NO_WAIT 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it moved? i think that it would be better to keep it here (part of kernel API). Internals (complexity) could be hidden in sys_clock.h and here have something like #define K_NO_WAIT Z_NO_WAIT same with K_FOREVER. I would also prefer to see conversion macros here as well. They can also be simplified to #define K_TIMEOUT_MS(ms) Z_TIMEOUT_MS(ms).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nordic-krch kernel.h includes kernel_includes.h includes sys_clock.h and a bunch of other stuff that's part of the kernel API. It's still visible where expected without cluttering a header that's already got a lot of stuff in it.

@nordic-krch
Copy link
Contributor

for absolute timeouts API i was thinking about adding:

  • k_ticks_t k_now(void) which returns absolute now
  • extending k_timer callback to include k_ticks_t when it was scheduled to expire (we have that in counter.h API) that may be useful for scheduling chain of timeouts without accumulative error and learning the delay between scheduled time and actual time of callback execution.

I would stick to calling it absolute time rather than uptime to not close ourselves for 32 bit k_ticks_t.

What about deprecation of k_uptime_get_32 and k_uptime_get?

Andy Ross added 7 commits October 1, 2019 10:39
There's still some discussion about this API, so rename the macros to
live within the Z_* namespace for the first version.

(Ideally this should be squashed with the original commit, but
subsequent commits are using this API and it would require
hand-editting of patch files that are likely to delay review.)

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
More recent CI tests were flagging the argument name mismatch between
the declaration and documentation in two functions, and didn't like
the fact that the absolute timeout macros were documented even in
kconfig situations where the macros were not defined.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The case needs to be of the parenthesized expression if the argument
is itself a compount expression.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The documentation didn't say explicitly how one-shot mode works, by
passing K_NO_WAIT or K_FOREVER as the period parameter.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Upgrade the argument to k_sleep() from a 32 bit millisecond count to a
proper k_timeout_t to symmetrize it with the other timeout parameters
and make it possible to sleep in any time unit and until absolute
timeout events without using a k_timer.

Also add a k_msleep() API with the earlier semantics for the benefit
of simple code that doesn't want to deal with timeouts.  This is
implemented as a trivial wrapper around the new k_sleep(), as is
k_usleep().

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There is desire that the legacy mode of the timeout API be the default
configuration, so change that value and swap the variable in all the
existing tests.  Note that no test configuration is changing in this
commit, just the default sense of the variable.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
These crossed in a rebase, where new code (or in the header's case old
code included in new context) needs to honor the new timeout API
conventions.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
@andyross
Copy link
Contributor Author

andyross commented Oct 1, 2019

Today's rebase is up. Will have the legacy rework done in a day or two

@pabigot pabigot requested review from pabigot and removed request for pdunaj October 2, 2019 14:40
Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not mergeable because:

  • The PR is too big. Proposal to split it into multiple independently reviewable and mergeable PRs:

    • Change all timeout uses with integers to use macros (i.e. clean up use of existing API)

    • Introduce k_timeout_t as a parameter to the kernel functions without changing functionality (i.e. just replace s32_t timeout with struct k_timeout_t timeout where struct k_timeout_t {s32_t timeout;}; and update the macros).

      At this point the build will break because usages of the APIs like k_sem_take(..., timeout) where timeout is an s32_t variable. Here codeowners need to get involved to determine whether they want to change internal API and state to use k_timeout_t or to continue to represent delays in integers.

    • Follow-on PRs based on new functionality needs (64-bit timeouts, absolute/relative timeouts, multiple clocks, ...).

  • The PR takes decisions that have not been analyzed properly:

    • Representation of anything other than the legacy API (s32_t). eg. usage of 64-bit timeouts and relative vs absolute timeouts.
    • Divergence of opinion on k_sleep() and k_thread() and how related delays should be expressed
    • May include changes to conform to new API without coordination with code owners.

@andyross
Copy link
Contributor Author

andyross commented Oct 2, 2019

OK, so I need to split it up then. Will queue that up after the removal of the legacy mode.

I'm confused about the other requirements. Can you give me a specific list of changes that need to be made?

Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "request changes" is a "reject" in the sense of #15912. I apologize for not taking this step earlier.

This PR is unreviewable, unmaintainable, and unsatisfactory. Conceived as a prototype for encoding timeout information it has progressed into multiple API changes and functional enhancements. Requests here and elsewhere for planning and coordination to avoid wasted effort have not been effective.

We've learned a lot from this work, some of which is reflected in its content. On request I posted initial steps for salvaging it but my professional opinion is it should instead be discarded. Refactoring/rebasing it cannot produce a quality result. I'm very appreciative of @andyross's efforts, but see the sunk cost fallacy.

The "salvage" plan should be seen as taking the lessons learned and applying them in a series of well-defined and understandable steps. Those steps progress toward addressing stakeholder concerns about timeout functionality while serving as independently valuable improvements to Zephyr.

#19562 is an example of the first step. The second step is defined; I can proceed with it, or if all parties agree @andyross could take it on. Further steps are not clear, and should be identified based on what we've learned here and from experience with the second step.

@nashif
Copy link
Member

nashif commented Dec 21, 2019

I assume we are going to have a fresh start with a new PR, so closing this one to have less noise and less open PRs :)

@nashif nashif closed this Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Bluetooth area: Kernel area: native port Host native arch port (native_sim) area: Networking area: Samples Samples area: Test Framework Issues related not to a particular test, but to the framework instead area: Tests Issues related to a particular existing or missing test area: Timer Timer Release Notes To be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.