kernel: add new work queue implementation #29618

pabigot · 2020-10-28T16:35:08Z

This PR replaces #28891 as a candidate solution for #27356 by addressing the concerns identified in #28891 (comment).

It provides a complete reimplementation of the work queue infrastructure intended to eliminate the race conditions and feature gaps in the existing implementation. The general functionality is based on that provided by Linux.

Both bare and delayable work structures are supported. Items can be submitted; delayable items can be scheduled for submission at a future time. Items can be delayed, queued, and running all at the same time. A running item can also be canceling.

The new implementation:

replaces "pending" with "busy" which identifies the active states;
supports canceling delayed and submitted items;
prevents resubmission of a item being canceled until cancellation completes;
supports waiting for cancellation to complete;
supports flushing a work item (waiting for the last submission to complete without preventing resubmission);
supports waiting for a queue to drain (only allows resubmission from the work thread);
prevents handler-reentrancy during resubmission.

Details on the API changes are in this comment.

Fixes #27356

Plan for Merge

There is a strong desire to get this in to 2.5 so that the legacy API could be removed for 2.6 (LTS) if TSC approves an exception. IIRC discussion off-line suggested that this API could be restricted to supervisory mode use, however there is a large open question regarding how/whether user mode needs a similar capability.

Current steps required to merge this (rm DRAFT are conditions that block removing the DNM status from this)

Merge dependent PR kernel: const-qualify objects used to calculate delay values #30073: kernel: const-qualify objects used to calculate delay values
Remove the old implementation (currently unselected by Kconfig): NOTE This causes massive irrelevant changes to be observed in the start to finish diff of include/kernel.h. See the individual commits to confirm that only the old API was removed.
Document which parts of the legacy API will remain, and which parts are to be deprecated. See this comment.
Full test and code coverage
Determine the degree to which k_work_submit_to_user_queue must be supported. (Needs input from out-of-tree users: Zephyr itself does not use it). RESOLUTION: provide new API with the same function names and behavior, but different signatures since k_work and k_workq cannot be shared between supervisor and userspace interfaces.)
If necessary, provide a deprecated work around for legacy user queue API. RESOLUTION Not applicable; the replacement API will not be backward compatible in signature, but will be compatible in implementation.
Determine whether WIP kernel: define and use a formal scheduling API #29668 is going anywhere and finalize the method for handling scheduling in the implementation. RESOLUTION: Extracted the used functions as kernel private functions until we confirm they're suitable for public exposure.
Pass CI on all platforms (confirmed possible)
Confirm that deprecated usages in-tree (largely networking and bluetooth) can be successfully replaced by the new API, probably via draft PRs based on this PR.

pabigot · 2020-10-28T20:35:28Z

For posterity here are the design notes prior to implementation, which may provide some high-level insight of intent. The material is obsolete as implementation often identified a better path or desired behavior.

Expand for content

Design questions:

Is there a need to be able to dynamically create and destroy workqueue objects and threads? Absent compelling justification the answer is "no".

Queue States and Actions

Design requirements:

It must be possible to drain a work queue in the sense of Linux drain_workqueue.
It must be possible to flush a work queue in the sense of Linux flush_workqueue.

Terminology

chained work is work that is submitted from a work queue thread to the same work queue.

States

A queue has a lock that protects the other members of the state.

A queue has a work list which is a sequence of items to be processed.

A queue has a drain waitq that holds threads that are waiting for it to drain.

A queue has a set of flags including:

draining which indicates that it has been instructed accept only chained work.

Actions

A submit action [any] provides an item. It is invoked when the global
work item lock is held. Atomically:

If the queue is not draining or the work item is chained:
- the item is added to the work list;
- the work item queued bit is set;
- the work thread is notified
  The action result indicates whether the inner actions were taken.

A cancel action [external] provides an item. It is invoked when the global work item lock is held. Atomically:

If the item is removed from the queue:
- the work item queued bit is cleared;
- the work thread is notified

A check action [internal] is invoked whenever the work thread is notified or completes a process action. Atomically:

Any pending notification is cleared.
If no work item is available:
- if the queue is draining the drain waitq is woken
If a work item is fetched then:
- the work item queued bit is cleared;
- the work item running bit is set;
  The lock is released. If a work item was fetched a process action is initiated, otherwise the work thread waits for notification. On completion of either the check action is re-initiated.

A process action [internal] executes the handler of a work item that has been marked running. When the handler returns, atomically:

The item running state is cleared;
The item cancelling state is cleared;
All processes waiting for completion of the work item are made ready.

A drain action [external] indicates that the queue needs to be drained to empty. Atomically:

If the work list is empty and the work thread is idle, return false; otherwise
Mark the queue draining and pend on the drain waitq.

A flush action [external] indicates that the caller must wait until completion of a given work item should be signalled.

Minutiae

Flush and synchronous cancel operations involve pushing a marker onto the work list immediately after whatever work item is being flushed or cancelled.

Item States and Actions

Design requirements:

It must be possible to synchronously cancel a non-delayable work item in the sense of Linux cancel_work_sync.
(If queued, remove from queue. Otherwise block attempts to resubmit and flush.)

Note: This differs from flush_work in that any attempts to resubmit the work before cancellation completes will be rejected.
{TBD} It must be possible to asynchronously (ISR) cancel a delayable work item in the sense of Linux cancel_delayed_work.
(If scheduled, abort timeout. If queued, remove from queue.)
It must be possible to synchronously cancel a delayable work item in the sense of Linux cancel_delayed_work_sync.
It must be possible to synchronously flush a non-delayable work item in the sense of Linux flush_work. (Basically inserts a notifier in the work queue immediately after the
work item.)
It must be possible to synchronously flush a delayable work item in the sense of Linux flush_delayed_work. (If delayed, the timeout is cancelled and the item immediately queued. Otherwise equivalent to flush_work().)

States

There is a global work item lock that protects the flag

A work item has a node that allows it to be linked into work lists.

A work item has a handler function that is invoked when it is de-queued by a work thread.

A work item has a set of flags including:

delayed if there is a timeout that will submit the item to a specific workqueue when the delay expires;
queued if it is on the queue for a work thread to process;
running (??) if a work thread is running its handler (this is required to ensure non-reentrancy when submitting. It may be derived from a non-null workq pointer in the work object.)
cancelling if it has been cancelled and is also running.
flusher if the item is a flush marker;

An item is owned if it is in at least one state of delayed, queued, or cancelling.

Note that idle is the only distinct state: a work item may be simultaneously delayed, queued, and running.

Events

A submit action (ISR-OK) provides an item and a workqueue. Atomically:

If the item is running, the provided workqueue is replaced with the workqueue running the item; then
If the item is not queued and not cancelling, it is submitted to the workqueue. If the submission is successful the item is marked queued.

The action result indicates whether the item was queued.

A process action from the work thread mutates the queued and running flags.

A complete action from the work thread mutates the running and cancelling flags.

A schedule action (ISR-OK) provides a work item, a workqueue, and a delay. Atomically:

If the item is not owned:
- if the delay is zero, a submit action is initiated for the item and the workqueue; otherwise
- a timeout is scheduled to perform a submit operation for the item and the workqueue after the delay expires
  The action result indicates whether the item was owned.

A reschedule action (ISR-OK) provides a work item, a workqueue, and a delay. Atomically:

if the item is delayed, the timeout is cancelled and the delayed flag is cleared;
evaluate the body of the schedule action when the item is not owned.
The action result indicates whether the item was originally owned.

A flush action (thread-only) provides a work item.

If the item is running, the calling thread blocks on the waitq for notification when the item completes.
Note: Nothing prevents scheduling or submitting the item again; the notification occurs only when then the currently running instance completes.

A cancel action (ISR-OK) provides a work item. Atomically:

If the item is delayed, its timeout is cancelled and the delayed state cleared;
If the item is queued, it is removed from its queue and the queued state cleared;
If the item is running, it is marked as cancelling.

A cancel-sync action (thread-only) provides a work item. Atomically:

a cancel action is executed; then
if the item is marked cancelling a flush action is initiated.

eanderlind · 2020-11-02T20:35:12Z

@pabigot Rather than redefine all the k_*work structs (and thus all the apis), could you retain
most of the existing struct definitions and use CONTAINER_OF() macro (if permitted in kernel) to find
an encapsulating new structure that contains any new fields?
Eg rather than adding struct k_work2_queue *queue; to k_work (forcing lots of api redfinitions), can one use CONTAINER_OF to find the queue pointer in k_work2_delayable (or even better k_work_delayable?
I suspect you could use this technique to limit number of new structs to one or two. This would simplify migration to the new api. (Eg k_work2_queue could be defined using pre-existing structs k_queue, etc

pabigot · 2020-11-02T21:11:13Z

@pabigot Rather than redefine all the k_*work structs (and thus all the apis), could you retain
most of the existing struct definitions and use CONTAINER_OF() macro (if permitted in kernel) to find
an encapsulating new structure that contains any new fields?

Probably not. At the moment my goal is confirming that this meets the desired capabilities (i.e. avoids the breakage of the race conditions) and is acceptable to the kernel folks. Next step is seeing whether the existing API can be implemented in terms of the new one. I'm not particularly interested in preserving the actual content of existing structures.

eanderlind · 2020-11-03T17:34:35Z

@pabigot Another attempt to explain. Due to the definition of a new k_work2 struct, it looks like a system that supports the new and old work queue apis simultaneously would require two system work queues in parallel with separate threads for each (since k_work2_queue_start() creates the new thread.) This situation is very likely during a multi-release transition period. A better architecture would be to have a single shared system queue, since it is much easier to reason about and debug. It also avoids frequent swapping between the two threads. To discuss whether there is an internal flag to check that caller doesn't mix and match new and old api calls. Keeping k_work as is would permit reducing the number of new apis and reduce the amount of effort for application/driver maintainers to migrate to the new api. Apis such as k_work_handler and k_work_init could be shared between both api versions.

pabigot · 2020-11-03T18:18:26Z

Another attempt to respond:

The existing API does not meet community needs. There is no way to precisely determine the state of any work item, nor to be informed when it's safe to shut down a work queue, nor to submit another work item without risking duplicate handling because of race conditions.

This PR provides the work queue capabilities that I believe Zephyr needs, with many desirable functional improvements, grounded on an architectural approach from Linux that's proven to be generally satisfactory.

The first milestone is whether this implementation is acceptable to the kernel folks on technical grounds. The second milestone is whether the existing API can be implemented in terms of it without significant behavioral differences. The third milestone is whether the new capabilities are exposed in a way that fits how applications will want to use them.

We're at the first milestone. That's why this is in draft.

Yes, in an ideal world we would have only one workqueue implementation: there would not be a k_work2*, just k_work* with different internals and new API adjacent to the existing API. That's a goal. It is not yet clear that can be reached (there are very weird behaviors that application code may be depending on), but if it can, it isn't going to be accomplished by extending the existing implementation. The data structures and state management it uses are the wrong ones; for example k_queue is a locked data-structure, but precise tracking of work state requires a lock with wider scope, so k_queue gives no benefit over sys_slist_t.

If we're stuck at the first milestone for more than a week it'll probably work back to the top of my queue and I'll start on checking compatibility, but for now I'm working other issues while waiting for kernel review.

nashif · 2020-11-19T20:25:08Z

@pabigot not sure what you did to kernel.h, but it is almost changing every line and makes it unreviewable here, it looks like it was completely rewritten:

include/kernel.h       | 2687 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------

pabigot · 2020-11-19T21:57:16Z

@pabigot not sure what you did to kernel.h, but it is almost changing every line and makes it unreviewable here, it looks like it was completely rewritten:

It's actually only moving a large block that had k_work down below k_mutex and others, then adding the block that has `k_work2, but because there's so much similarity across the functions the diff ends up being horrifying. Before this comes out of draft it'll get cleaned up.

pabigot · 2021-03-02T12:24:03Z

Can you fix the missing blank line in the compliance check?

I didn't introduce or even touch change that line, but yes, I've added a commit that fixes it in the apparent aggregate changes so checkpatch shuts up.

I've also added DNM commit to revert the commit that #32724 suggests is the cause of the CI failures, to see if they go away (they do locally). PR is tagged DNM until that can be removed, but IMO should not be merged until the problem is fixed, as any other PRs that happen to trigger the work queue tests would risk getting CI failures that aren't related to the PR content.

pabigot · 2021-03-02T14:32:43Z

Failed CI at c5aa3de which has this branch minus a checkpatch cleanup
Passed CI at ea1bcfe with the checkpatch cleanup and reverting 4ae44dd with the check

Revert submitted as #32790.

This is needed because Zephyr libraries created after `find_package(Zephyr)` are not pure Zephyr libraries, as they are not in the `whole-archive` linking grouping. Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>

The work poll API is defined in terms of the k_work API. Shift a structure definition around so it's not within the details of a specific k_work API implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

These functions are a subset of proposed public APIs to clean up several issues related to safely handling waking of threads. They have been made private as they interface may change, but their use will simplify the reimplementation of the k_work functionality. See: zephyrproject-rtos#29668 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com> Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Attempts to reimplement the existing work API using a new work implementation failed, primarily due to heavy use of whitebox testing in validating the original API. Add a temporary Kconfig that will select between the two implementations so we can use the same identifiers but select which implementation they reference. This commit just adds the selection infrastructure and uses it to conditionalize the existing implementation in anticipation of the new one in the next commit. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

This commit provides a complete reimplementation of the work queue infrastructure intended to eliminate the race conditions and feature gaps in the existing implementation. Both bare and delayable work structures are supported. Items can be submitted; delayable items can be scheduled for submission at a future time. Items can be delayed, queued, and running all at the same time. A running item can also be canceling. The new implementation: * replaces "pending" with "busy" which identifies the active states; * supports canceling delayed and submitted items; * prevents resubmission of a item being canceled until cancellation completes; * supports waiting for cancellation to complete; * supports flushing a work item (waiting for the last submission to complete without preventing resubmission); * supports waiting for a queue to drain (only allows resubmission from the work thread); * supports stopping a work queue in conjunction with draining it; * prevents handler-reentrancy during resubmission. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Revise the description of queues, work items, and delayable work items to reflect the terminology and API provided by the new implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Covers all lines that can be reached in controlled conditions. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

The tcp2 infrastructure is using the legacy delayed work API, and relies heavily on the transient state indicated by an estimate of delayed time remaining to determine whether a delayed work item is still active. While the wrappers for this work in most cases, one use is unsanctioned: directly accessing the fields of k_delayed_work structure to satisfy the calling parameters of the handler when invoked directly. The chosen solution for this specific need in the new API is to use a schedule (rather than reschedule) operation, which leaves any previous timer unchanged but allows immediate submission if the work is idle. This changes behavior in that the resend is delegated to the work queue, rather than done immediately. The former behavior can be supported by further refactoring that turns the work handler into a wrapper around a function that takes a connection reference, and invoking that here, while the handler invokes it after reconstructing the connection from the contained work item. For now put in a hack that also uses the non-public fields of the delayed work structure to implement the required behavior. The complete fix if this solution is used requires replacing all use of k_delayed_work in this module with k_work_delayable, leveraging the new functionality of the API to avoid having to guess about the true state of a work item based on its transient timer or flag states. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Switch the default and clean up some test workarounds. This will enable final conversions necessary to transition to the new API. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Replace legacy API with new API. Note that this driver uses the schedule, not reschedule, API, since triggers for delay never overlap. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

The new API cannot be used from userspace because it is not merely a wrapper around existing userspace-capable objects (threads and queues), but instead requires much more complex and lower-level access to memory that can't be touched from userspace. The vast majority of work queue users are operating from privileged mode, so there's little motivation to go through the pain and complexity of converting all functions to system calls. Copy the necessary pieces of the existing userspace work queue API out and expose them with new names and types: * k_work_handler_t becomes k_work_user_handler_t * k_work becomes k_work_user * k_work_q becomes k_work_user_q etc. Because the replacement API cannot use the same types new API names are also introduced to make it more clear that the userspace work queue API is a separate functionality. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

Now that the old API has been reimplemented with the new API remove the old implementation and its tests. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

checkpatch misinterprets the macro that generates event data structures as being code rather than more data. This code has not been changed, but rearrangement of code around it causes a false positive when the aggregate changes are checked for style. Add an extra line to eliminate the warning. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

nashif · 2021-03-03T23:30:22Z

is the DNM still valid here?

pabigot · 2021-03-04T00:29:41Z

is the DNM still valid here?

No, removed now.

github-actions bot added area: API Changes to public APIs area: Kernel area: Tests Issues related to a particular existing or missing test labels Oct 28, 2020

pabigot requested a review from andrewboie October 28, 2020 18:37

pabigot force-pushed the nordic/work2 branch from 90d92ca to b2fce44 Compare October 29, 2020 15:37

This comment has been minimized.

Sign in to view

pabigot mentioned this pull request Nov 2, 2020

WIP kernel: define and use a formal scheduling API #29668

Closed

10 tasks

pabigot mentioned this pull request Nov 17, 2020

sflist: move proxy node implementation to data structure API #30097

Closed

pabigot force-pushed the nordic/work2 branch from b2fce44 to 97367d1 Compare November 18, 2020 17:06

github-actions bot added the area: Networking label Nov 18, 2020

This comment has been minimized.

Sign in to view

pabigot force-pushed the nordic/work2 branch 4 times, most recently from 211b5cd to 551aba9 Compare November 19, 2020 17:18

nashif requested a review from andyross November 20, 2020 13:07

pabigot force-pushed the nordic/work2 branch 7 times, most recently from 5dba9cf to 711c89e Compare November 21, 2020 19:14

pabigot added the DNM This PR should not be merged (Do Not Merge) label Mar 2, 2021

github-actions bot added the area: Timer Timer label Mar 2, 2021

pabigot force-pushed the nordic/work2 branch from ea1bcfe to 52c4d7f Compare March 2, 2021 14:32

pabigot mentioned this pull request Mar 2, 2021

Revert "drivers/timer/hpet: Extend qemu workaround" #32790

Merged

tejlmand and others added 13 commits March 2, 2021 14:53

tests: link settings_test_fs with kernel

b034619

This is needed because Zephyr libraries created after `find_package(Zephyr)` are not pure Zephyr libraries, as they are not in the `whole-archive` linking grouping. Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>

kernel: refactor to separate k_work_poll from k_work

3a13911

The work poll API is defined in terms of the k_work API. Shift a structure definition around so it's not within the details of a specific k_work API implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

doc: kernel: workqueue: update for new workqueue API

461e809

Revise the description of queues, work items, and delayable work items to reflect the terminology and API provided by the new implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

tests: kernel: add test of new work queue API

4fb7894

Covers all lines that can be reached in controlled conditions. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

kernel: default to new work API implementation

d0a84eb

Switch the default and clean up some test workarounds. This will enable final conversions necessary to transition to the new API. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

drivers: regulator: update to new delayable work API

4ca202c

Replace legacy API with new API. Note that this driver uses the schedule, not reschedule, API, since triggers for delay never overlap. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

kernel: remove old work queue implementation

b8b71f6

Now that the old API has been reimplemented with the new API remove the old implementation and its tests. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>

pabigot force-pushed the nordic/work2 branch from 52c4d7f to 0070ce0 Compare March 2, 2021 20:57

pabigot removed the DNM This PR should not be merged (Do Not Merge) label Mar 4, 2021

nashif merged commit 7aefa3d into zephyrproject-rtos:master Mar 4, 2021

This was referenced Mar 4, 2021

kernel: deprecate old k_work API #32888

Merged

Updating Zephyr to fix Work Queue Problems #33104

Closed

pabigot deleted the nordic/work2 branch March 15, 2021 13:10

pabigot mentioned this pull request May 13, 2021

Don't fail on resubmit while delayed works are canceled #35255

Open

VeijoPesonen mentioned this pull request Aug 5, 2021

drivers: gps: nrf9160_gps: update GPS driver to use GNSS API nrfconnect/sdk-nrf#5233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: add new work queue implementation #29618

kernel: add new work queue implementation #29618

pabigot commented Oct 28, 2020 •

edited

Loading

pabigot commented Oct 28, 2020 •

edited

Loading

Queue States and Actions

Terminology

States

Actions

Minutiae

Item States and Actions

States

Events

This comment has been minimized.

eanderlind commented Nov 2, 2020

pabigot commented Nov 2, 2020

eanderlind commented Nov 3, 2020

pabigot commented Nov 3, 2020

This comment has been minimized.

nashif commented Nov 19, 2020

pabigot commented Nov 19, 2020

pabigot commented Mar 2, 2021 •

edited

Loading

pabigot commented Mar 2, 2021

nashif commented Mar 3, 2021

pabigot commented Mar 4, 2021

kernel: add new work queue implementation #29618

kernel: add new work queue implementation #29618

Conversation

pabigot commented Oct 28, 2020 • edited Loading

Plan for Merge

pabigot commented Oct 28, 2020 • edited Loading

Queue States and Actions

Terminology

States

Actions

Minutiae

Item States and Actions

States

Events

This comment has been minimized.

eanderlind commented Nov 2, 2020

pabigot commented Nov 2, 2020

eanderlind commented Nov 3, 2020

pabigot commented Nov 3, 2020

This comment has been minimized.

nashif commented Nov 19, 2020

pabigot commented Nov 19, 2020

pabigot commented Mar 2, 2021 • edited Loading

pabigot commented Mar 2, 2021

nashif commented Mar 3, 2021

pabigot commented Mar 4, 2021

pabigot commented Oct 28, 2020 •

edited

Loading

pabigot commented Oct 28, 2020 •

edited

Loading

pabigot commented Mar 2, 2021 •

edited

Loading