-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rt(threaded): cap LIFO slot polls #5712
Conversation
As an optimization to improve locality, the multi-threaded scheduler maintains a single slot (LIFO slot). When a task is scheduled, it goes into the LIFO slot. The scheduler will run tasks in the LIFO slot first, before checking the local queue. In ping-ping style workloads where task A notifies task B, which notifies task A again, this can cause starvation as these two tasks will repeatedly schedule the other in the LIFO slot. #5686, a first attempt at solving this problem, consumes a unit of budget each time a task is scheduled from the LIFO slot. However, at the time of this commit, the scheduler allocates 128 units of budget for each chunk of work. This is quite high in situation where tasks do not perform many async operations, yet have meaningful poll times (even 5-10 microsecond poll times can have outsized impact on the scheduler). In an ideal world, the scheduler would adapt to the workload it is executing. However, as a stopgap, this commit limits the number of times the LIFO slot is prioritized per scheduler tick.
All other scheduler benchmark results are unchanged (within the margin of error). |
Might not be a bad idea to add an internal highly unstable runtime metric for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me! I agree with @Noah-Kennedy's suggestion of adding a new unstable runtime metric to track this behavior, though.
// Run the LIFO task, then loop | ||
core.metrics.start_poll(); | ||
*self.core.borrow_mut() = Some(core); | ||
let task = self.worker.handle.shared.owned.assert_owner(task); | ||
task.run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be less error-prone to reset lifo_enabled
after calling task.run()
? For example, right now you don't reset it in the Err(())
branch. That's probably fine because the worker thread shuts down there, but it is non-obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in the error case, the core
is stolen. We cannot reset it either way. That said, it brings up a point that we need to reset it if it is stolen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just reset before polling a task ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset before polling a task isn't robust either because we can schedule tasks outside of the context of running a task. e.g. from polling the I/O driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, what I suggest is we add debug_asserts around to ensure the value is correct.
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` | | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` | --- ### Release Notes <details> <summary>tokio-rs/tokio (tokio)</summary> ### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1 [Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1) ##### Fixed - rt: fix nesting two `block_in_place` with a `block_on` between (#​5837]) #​5837]: tokio-rs/tokio#5837 ### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0 [Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0) Technically a breaking change, the `Send` implementation is removed from `runtime::EnterGuard`. This change fixes a bug and should not impact most users. ##### Breaking - rt: `EnterGuard` should not be `Send` (#​5766]) ##### Fixed - fs: reduce blocking ops in `fs::read_dir` (#​5653]) - rt: fix possible starvation (#​5686], #​5712]) - rt: fix stacked borrows issue in `JoinSet` (#​5693]) - rt: panic if `EnterGuard` dropped incorrect order (#​5772]) - time: do not overflow to signal value (#​5710]) - fs: wait for in-flight ops before cloning `File` (#​5803]) ##### Changed - rt: reduce time to poll tasks scheduled from outside the runtime (#​5705], #​5720]) ##### Added - net: add uds doc alias for unix sockets (#​5659]) - rt: add metric for number of tasks (#​5628]) - sync: implement more traits for channel errors (#​5666]) - net: add nodelay methods on TcpSocket (#​5672]) - sync: add `broadcast::Receiver::blocking_recv` (#​5690]) - process: add `raw_arg` method to `Command` (#​5704]) - io: support PRIORITY epoll events (#​5566]) - task: add `JoinSet::poll_join_next` (#​5721]) - net: add support for Redox OS (#​5790]) ##### Unstable - rt: add the ability to dump task backtraces (#​5608], #​5676], #​5708], #​5717]) - rt: instrument task poll times with a histogram (#​5685]) #​5766]: tokio-rs/tokio#5766 #​5653]: tokio-rs/tokio#5653 #​5686]: tokio-rs/tokio#5686 #​5712]: tokio-rs/tokio#5712 #​5693]: tokio-rs/tokio#5693 #​5772]: tokio-rs/tokio#5772 #​5710]: tokio-rs/tokio#5710 #​5803]: tokio-rs/tokio#5803 #​5705]: tokio-rs/tokio#5705 #​5720]: tokio-rs/tokio#5720 #​5659]: tokio-rs/tokio#5659 #​5628]: tokio-rs/tokio#5628 #​5666]: tokio-rs/tokio#5666 #​5672]: tokio-rs/tokio#5672 #​5690]: tokio-rs/tokio#5690 #​5704]: tokio-rs/tokio#5704 #​5566]: tokio-rs/tokio#5566 #​5721]: tokio-rs/tokio#5721 #​5790]: tokio-rs/tokio#5790 #​5608]: tokio-rs/tokio#5608 #​5676]: tokio-rs/tokio#5676 #​5708]: tokio-rs/tokio#5708 #​5717]: tokio-rs/tokio#5717 #​5685]: tokio-rs/tokio#5685 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
As an optimization to improve locality, the multi-threaded scheduler maintains a single slot (LIFO slot). When a task is scheduled, it goes into the LIFO slot. The scheduler will run tasks in the LIFO slot first before checking the local queue.
Ping-ping style workloads where task A notifies task B, which notifies task A again, can cause starvation as these two tasks repeatedly schedule the other in the LIFO slot. #5686, a first attempt at solving this problem, consumes a unit of budget each time a task is scheduled from the LIFO slot. However, at the time of this commit, the scheduler allocates 128 units of budget for each chunk of work. This is relatively high in situations where tasks do not perform many async operations yet have meaningful poll times (even 5-10 microsecond poll times can have an outsized impact on the scheduler).
In an ideal world, the scheduler would adapt to the workload it is executing. However, as a stopgap, this commit limits the number of times the LIFO slot is prioritized per scheduler tick.
Benchmarks
In a benchmark crafted to simulate injecting tasks while the runtime is under load, this change sped things up 30x (68s -> 2s)
This is the benchmark I used to measure this change's improvements. I am still waiting to include it because, even with this change, it causes the scheduler benchmarks to run for a very long time.