-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rt(threaded): basic self-tuning of injection queue #5720
Conversation
Each multi-threaded runtime worker prioritizes pulling tasks off of its local queue. Every so often, it checks the injection (global) queue for work submitted there. Previously, "every so often," was a constant "number of tasks polled" value. Tokio sets a default of 61, but allows users to configure this value. If workers are under load with tasks that are slow to poll, the injection queue can be starved. To prevent starvation in this case, this commit implements some basic self-tuning. The multi-threaded scheduler tracks the mean task poll time using an exponentially-weighted moving average. It then uses this value to pick an interval at which to check the injection queue. This commit is a first pass at adding self-tuning to the scheduler. There are other values in the scheduler that could benefit from self-tuning (e.g. the maintenance interval). Additionally, the current-thread scheduler could also benfit from self-tuning. However, we have reached the point where we should start investigating ways to unify logic in both schedulers. Adding self-tuning to the current-thread scheduler will be punted until after this unification.
9b49b59
to
df96c16
Compare
// was called, turning the poll into a "blocking op". In this | ||
// case, we don't want to measure the poll time as it doesn't | ||
// really count as an async poll anymore. | ||
core.metrics.end_poll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes might be a bit controversial, but when we poll the LIFO slot, we aren't "really" polling a new task, but we are batching polling the lifo task under the initially polled task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this feels reasonable. can we add a comment explaining that rationale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I will work on fixing CI. I will also investigate tests, but I think that might be hard because they are timing based. |
We should probably test this change on a few different workloads to see how it performs. |
@Noah-Kennedy Go for it. Tokio's benchmarks (except the one this targets) remain unchanged (margin of error). |
This reverts commit 9ff9218.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the implementation looks really solid! i had a few minor nitpicks but nothing blocking.
// was called, turning the poll into a "blocking op". In this | ||
// case, we don't want to measure the poll time as it doesn't | ||
// really count as an async poll anymore. | ||
core.metrics.end_poll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this feels reasonable. can we add a comment explaining that rationale?
PR #5720 introduced runtime self-tuning. It included a test that attempts to verify self-tuning logic. The test is heavily reliant on timing details. This patch attempts to make the test a bit more reliable by not assuming tuning will converge within a set amount of time.
PR #5720 introduced runtime self-tuning. It included a test that attempts to verify self-tuning logic. The test is heavily reliant on timing details. This patch attempts to make the test a bit more reliable by not assuming tuning will converge within a set amount of time.
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` | | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` | --- ### Release Notes <details> <summary>tokio-rs/tokio (tokio)</summary> ### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1 [Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1) ##### Fixed - rt: fix nesting two `block_in_place` with a `block_on` between (#​5837]) #​5837]: tokio-rs/tokio#5837 ### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0 [Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0) Technically a breaking change, the `Send` implementation is removed from `runtime::EnterGuard`. This change fixes a bug and should not impact most users. ##### Breaking - rt: `EnterGuard` should not be `Send` (#​5766]) ##### Fixed - fs: reduce blocking ops in `fs::read_dir` (#​5653]) - rt: fix possible starvation (#​5686], #​5712]) - rt: fix stacked borrows issue in `JoinSet` (#​5693]) - rt: panic if `EnterGuard` dropped incorrect order (#​5772]) - time: do not overflow to signal value (#​5710]) - fs: wait for in-flight ops before cloning `File` (#​5803]) ##### Changed - rt: reduce time to poll tasks scheduled from outside the runtime (#​5705], #​5720]) ##### Added - net: add uds doc alias for unix sockets (#​5659]) - rt: add metric for number of tasks (#​5628]) - sync: implement more traits for channel errors (#​5666]) - net: add nodelay methods on TcpSocket (#​5672]) - sync: add `broadcast::Receiver::blocking_recv` (#​5690]) - process: add `raw_arg` method to `Command` (#​5704]) - io: support PRIORITY epoll events (#​5566]) - task: add `JoinSet::poll_join_next` (#​5721]) - net: add support for Redox OS (#​5790]) ##### Unstable - rt: add the ability to dump task backtraces (#​5608], #​5676], #​5708], #​5717]) - rt: instrument task poll times with a histogram (#​5685]) #​5766]: tokio-rs/tokio#5766 #​5653]: tokio-rs/tokio#5653 #​5686]: tokio-rs/tokio#5686 #​5712]: tokio-rs/tokio#5712 #​5693]: tokio-rs/tokio#5693 #​5772]: tokio-rs/tokio#5772 #​5710]: tokio-rs/tokio#5710 #​5803]: tokio-rs/tokio#5803 #​5705]: tokio-rs/tokio#5705 #​5720]: tokio-rs/tokio#5720 #​5659]: tokio-rs/tokio#5659 #​5628]: tokio-rs/tokio#5628 #​5666]: tokio-rs/tokio#5666 #​5672]: tokio-rs/tokio#5672 #​5690]: tokio-rs/tokio#5690 #​5704]: tokio-rs/tokio#5704 #​5566]: tokio-rs/tokio#5566 #​5721]: tokio-rs/tokio#5721 #​5790]: tokio-rs/tokio#5790 #​5608]: tokio-rs/tokio#5608 #​5676]: tokio-rs/tokio#5676 #​5708]: tokio-rs/tokio#5708 #​5717]: tokio-rs/tokio#5717 #​5685]: tokio-rs/tokio#5685 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Each multi-threaded runtime worker prioritizes pulling tasks off of its local queue. Every so often, it checks the injection (global) queue for work submitted there. Previously, "every so often," was a constant "number of tasks polled" value. Tokio sets a default of 61, but allows users to configure this value.
If workers are under load with tasks that are slow to poll, the injection queue can be starved. To prevent starvation in this case, this commit implements some basic self-tuning. The multi-threaded scheduler tracks the mean task poll time using an exponentially-weighted moving average. It then uses this value to pick an interval at which to check the injection queue.
This commit is a first pass at adding self-tuning to the scheduler. There are other values in the scheduler that could benefit from self-tuning (e.g. the maintenance interval). Additionally, the current-thread scheduler could also benfit from self-tuning. However, we have reached the point where we should start investigating ways to unify logic in both schedulers. Adding self-tuning to the current-thread scheduler will be punted until after this unification.
With this change, I can now add the benchmark mentioned in #5712 and this brings it down from 2s -> 100ms about.