-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: add new work queue implementation #29618
Conversation
For posterity here are the design notes prior to implementation, which may provide some high-level insight of intent. The material is obsolete as implementation often identified a better path or desired behavior. Expand for contentDesign questions:
Queue States and ActionsDesign requirements:
Terminologychained work is work that is submitted from a work queue thread to the same work queue. StatesA queue has a lock that protects the other members of the state. A queue has a work list which is a sequence of items to be processed. A queue has a drain waitq that holds threads that are waiting for it to drain. A queue has a set of flags including:
ActionsA submit action [any] provides an item. It is invoked when the global
A cancel action [external] provides an item. It is invoked when the global work item lock is held. Atomically:
A check action [internal] is invoked whenever the work thread is notified or completes a process action. Atomically:
A process action [internal] executes the handler of a work item that has been marked running. When the handler returns, atomically:
A drain action [external] indicates that the queue needs to be drained to empty. Atomically:
A flush action [external] indicates that the caller must wait until completion of a given work item should be signalled. MinutiaeFlush and synchronous cancel operations involve pushing a marker onto the work list immediately after whatever work item is being flushed or cancelled. Item States and ActionsDesign requirements:
StatesThere is a global work item lock that protects the flag A work item has a node that allows it to be linked into work lists. A work item has a handler function that is invoked when it is de-queued by a work thread. A work item has a set of flags including:
An item is owned if it is in at least one state of delayed, queued, or cancelling. Note that idle is the only distinct state: a work item may be simultaneously delayed, queued, and running. EventsA submit action (ISR-OK) provides an item and a workqueue. Atomically:
The action result indicates whether the item was queued. A process action from the work thread mutates the queued and running flags. A complete action from the work thread mutates the running and cancelling flags. A schedule action (ISR-OK) provides a work item, a workqueue, and a delay. Atomically:
A reschedule action (ISR-OK) provides a work item, a workqueue, and a delay. Atomically:
A flush action (thread-only) provides a work item.
A cancel action (ISR-OK) provides a work item. Atomically:
A cancel-sync action (thread-only) provides a work item. Atomically:
|
This comment has been minimized.
This comment has been minimized.
@pabigot Rather than redefine all the k_*work structs (and thus all the apis), could you retain |
Probably not. At the moment my goal is confirming that this meets the desired capabilities (i.e. avoids the breakage of the race conditions) and is acceptable to the kernel folks. Next step is seeing whether the existing API can be implemented in terms of the new one. I'm not particularly interested in preserving the actual content of existing structures. |
@pabigot Another attempt to explain. Due to the definition of a new k_work2 struct, it looks like a system that supports the new and old work queue apis simultaneously would require two system work queues in parallel with separate threads for each (since k_work2_queue_start() creates the new thread.) This situation is very likely during a multi-release transition period. A better architecture would be to have a single shared system queue, since it is much easier to reason about and debug. It also avoids frequent swapping between the two threads. To discuss whether there is an internal flag to check that caller doesn't mix and match new and old api calls. Keeping k_work as is would permit reducing the number of new apis and reduce the amount of effort for application/driver maintainers to migrate to the new api. Apis such as k_work_handler and k_work_init could be shared between both api versions. |
Another attempt to respond: The existing API does not meet community needs. There is no way to precisely determine the state of any work item, nor to be informed when it's safe to shut down a work queue, nor to submit another work item without risking duplicate handling because of race conditions. This PR provides the work queue capabilities that I believe Zephyr needs, with many desirable functional improvements, grounded on an architectural approach from Linux that's proven to be generally satisfactory. The first milestone is whether this implementation is acceptable to the kernel folks on technical grounds. The second milestone is whether the existing API can be implemented in terms of it without significant behavioral differences. The third milestone is whether the new capabilities are exposed in a way that fits how applications will want to use them. We're at the first milestone. That's why this is in draft. Yes, in an ideal world we would have only one workqueue implementation: there would not be a If we're stuck at the first milestone for more than a week it'll probably work back to the top of my queue and I'll start on checking compatibility, but for now I'm working other issues while waiting for kernel review. |
b2fce44
to
97367d1
Compare
This comment has been minimized.
This comment has been minimized.
211b5cd
to
551aba9
Compare
@pabigot not sure what you did to kernel.h, but it is almost changing every line and makes it unreviewable here, it looks like it was completely rewritten:
|
It's actually only moving a large block that had |
5dba9cf
to
711c89e
Compare
I didn't introduce or even touch change that line, but yes, I've added a commit that fixes it in the apparent aggregate changes so checkpatch shuts up. I've also added DNM commit to revert the commit that #32724 suggests is the cause of the CI failures, to see if they go away (they do locally). PR is tagged DNM until that can be removed, but IMO should not be merged until the problem is fixed, as any other PRs that happen to trigger the work queue tests would risk getting CI failures that aren't related to the PR content. |
This is needed because Zephyr libraries created after `find_package(Zephyr)` are not pure Zephyr libraries, as they are not in the `whole-archive` linking grouping. Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
The work poll API is defined in terms of the k_work API. Shift a structure definition around so it's not within the details of a specific k_work API implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
These functions are a subset of proposed public APIs to clean up several issues related to safely handling waking of threads. They have been made private as they interface may change, but their use will simplify the reimplementation of the k_work functionality. See: zephyrproject-rtos#29668 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com> Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Attempts to reimplement the existing work API using a new work implementation failed, primarily due to heavy use of whitebox testing in validating the original API. Add a temporary Kconfig that will select between the two implementations so we can use the same identifiers but select which implementation they reference. This commit just adds the selection infrastructure and uses it to conditionalize the existing implementation in anticipation of the new one in the next commit. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This commit provides a complete reimplementation of the work queue infrastructure intended to eliminate the race conditions and feature gaps in the existing implementation. Both bare and delayable work structures are supported. Items can be submitted; delayable items can be scheduled for submission at a future time. Items can be delayed, queued, and running all at the same time. A running item can also be canceling. The new implementation: * replaces "pending" with "busy" which identifies the active states; * supports canceling delayed and submitted items; * prevents resubmission of a item being canceled until cancellation completes; * supports waiting for cancellation to complete; * supports flushing a work item (waiting for the last submission to complete without preventing resubmission); * supports waiting for a queue to drain (only allows resubmission from the work thread); * supports stopping a work queue in conjunction with draining it; * prevents handler-reentrancy during resubmission. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Revise the description of queues, work items, and delayable work items to reflect the terminology and API provided by the new implementation. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Covers all lines that can be reached in controlled conditions. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
The tcp2 infrastructure is using the legacy delayed work API, and relies heavily on the transient state indicated by an estimate of delayed time remaining to determine whether a delayed work item is still active. While the wrappers for this work in most cases, one use is unsanctioned: directly accessing the fields of k_delayed_work structure to satisfy the calling parameters of the handler when invoked directly. The chosen solution for this specific need in the new API is to use a schedule (rather than reschedule) operation, which leaves any previous timer unchanged but allows immediate submission if the work is idle. This changes behavior in that the resend is delegated to the work queue, rather than done immediately. The former behavior can be supported by further refactoring that turns the work handler into a wrapper around a function that takes a connection reference, and invoking that here, while the handler invokes it after reconstructing the connection from the contained work item. For now put in a hack that also uses the non-public fields of the delayed work structure to implement the required behavior. The complete fix if this solution is used requires replacing all use of k_delayed_work in this module with k_work_delayable, leveraging the new functionality of the API to avoid having to guess about the true state of a work item based on its transient timer or flag states. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Switch the default and clean up some test workarounds. This will enable final conversions necessary to transition to the new API. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Replace legacy API with new API. Note that this driver uses the schedule, not reschedule, API, since triggers for delay never overlap. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
The new API cannot be used from userspace because it is not merely a wrapper around existing userspace-capable objects (threads and queues), but instead requires much more complex and lower-level access to memory that can't be touched from userspace. The vast majority of work queue users are operating from privileged mode, so there's little motivation to go through the pain and complexity of converting all functions to system calls. Copy the necessary pieces of the existing userspace work queue API out and expose them with new names and types: * k_work_handler_t becomes k_work_user_handler_t * k_work becomes k_work_user * k_work_q becomes k_work_user_q etc. Because the replacement API cannot use the same types new API names are also introduced to make it more clear that the userspace work queue API is a separate functionality. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Now that the old API has been reimplemented with the new API remove the old implementation and its tests. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
checkpatch misinterprets the macro that generates event data structures as being code rather than more data. This code has not been changed, but rearrangement of code around it causes a false positive when the aggregate changes are checked for style. Add an extra line to eliminate the warning. Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
is the DNM still valid here? |
No, removed now. |
This PR replaces #28891 as a candidate solution for #27356 by addressing the concerns identified in #28891 (comment).
It provides a complete reimplementation of the work queue infrastructure intended to eliminate the race conditions and feature gaps in the existing implementation. The general functionality is based on that provided by Linux.
Both bare and delayable work structures are supported. Items can be submitted; delayable items can be scheduled for submission at a future time. Items can be delayed, queued, and running all at the same time. A running item can also be canceling.
The new implementation:
Details on the API changes are in this comment.
Fixes #27356
Plan for Merge
There is a strong desire to get this in to 2.5 so that the legacy API could be removed for 2.6 (LTS) if TSC approves an exception. IIRC discussion off-line suggested that this API could be restricted to supervisory mode use, however there is a large open question regarding how/whether user mode needs a similar capability.
Current steps required to merge this (rm DRAFT are conditions that block removing the DNM status from this)
include/kernel.h
. See the individual commits to confirm that only the old API was removed.k_work_submit_to_user_queue
must be supported. (Needs input from out-of-tree users: Zephyr itself does not use it). RESOLUTION: provide new API with the same function names and behavior, but different signatures sincek_work
andk_workq
cannot be shared between supervisor and userspace interfaces.)