rt(threaded): cap LIFO slot polls #5712

carllerche · 2023-05-23T17:10:57Z

As an optimization to improve locality, the multi-threaded scheduler maintains a single slot (LIFO slot). When a task is scheduled, it goes into the LIFO slot. The scheduler will run tasks in the LIFO slot first before checking the local queue.

Ping-ping style workloads where task A notifies task B, which notifies task A again, can cause starvation as these two tasks repeatedly schedule the other in the LIFO slot. #5686, a first attempt at solving this problem, consumes a unit of budget each time a task is scheduled from the LIFO slot. However, at the time of this commit, the scheduler allocates 128 units of budget for each chunk of work. This is relatively high in situations where tasks do not perform many async operations yet have meaningful poll times (even 5-10 microsecond poll times can have an outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is executing. However, as a stopgap, this commit limits the number of times the LIFO slot is prioritized per scheduler tick.

Benchmarks

In a benchmark crafted to simulate injecting tasks while the runtime is under load, this change sped things up 30x (68s -> 2s)

This is the benchmark I used to measure this change's improvements. I am still waiting to include it because, even with this change, it causes the scheduler benchmarks to run for a very long time.

    const STALL_DUR: Duration = Duration::from_micros(10);
    const NUM_SPAWN: usize = 1_000;
    static TICKS: AtomicUsize = AtomicUsize::new(0);

    let rt_handle = rt.handle();
    let mut handles = Vec::with_capacity(NUM_SPAWN);
    let stall = Arc::new(AtomicBool::new(true));

    // Spawn some tasks to keep the runtimes busy
    for _ in 0..NUM_WORKERS {
        let stall = stall.clone();
        fn iter(stall: Arc<AtomicBool>) {
            tokio::spawn(async {
                if stall.load(Relaxed) {
                    TICKS.fetch_add(1, Relaxed);
                    let now = Instant::now();

                    while now.elapsed() < STALL_DUR {}

                    iter(stall);
                }
            });
        }
        rt.spawn(async { iter(stall) });
    }

    for _ in 0..NUM_SPAWN {
        handles.push(rt_handle.spawn(async {}));
    }

    rt.block_on(async {
        for handle in handles.drain(..) {
            handle.await.unwrap();
        }
    });
    stall.store(false, Relaxed);
    println!(" TICKS = {}", TICKS.load(Relaxed));

As an optimization to improve locality, the multi-threaded scheduler maintains a single slot (LIFO slot). When a task is scheduled, it goes into the LIFO slot. The scheduler will run tasks in the LIFO slot first, before checking the local queue. In ping-ping style workloads where task A notifies task B, which notifies task A again, this can cause starvation as these two tasks will repeatedly schedule the other in the LIFO slot. #5686, a first attempt at solving this problem, consumes a unit of budget each time a task is scheduled from the LIFO slot. However, at the time of this commit, the scheduler allocates 128 units of budget for each chunk of work. This is quite high in situation where tasks do not perform many async operations, yet have meaningful poll times (even 5-10 microsecond poll times can have outsized impact on the scheduler). In an ideal world, the scheduler would adapt to the workload it is executing. However, as a stopgap, this commit limits the number of times the LIFO slot is prioritized per scheduler tick.

tokio/src/runtime/scheduler/multi_thread/worker.rs

carllerche · 2023-05-23T17:18:00Z

All other scheduler benchmark results are unchanged (within the margin of error).

tokio/src/runtime/scheduler/multi_thread/worker.rs

Noah-Kennedy · 2023-05-23T17:58:16Z

Might not be a bad idea to add an internal highly unstable runtime metric for this.

hawkw

This makes sense to me! I agree with @Noah-Kennedy's suggestion of adding a new unstable runtime metric to track this behavior, though.

tokio/src/runtime/scheduler/multi_thread/worker.rs

Darksonn · 2023-05-23T19:42:50Z

tokio/src/runtime/scheduler/multi_thread/worker.rs

+                // Run the LIFO task, then loop
+                core.metrics.start_poll();
+                *self.core.borrow_mut() = Some(core);
+                let task = self.worker.handle.shared.owned.assert_owner(task);
+                task.run();


Would it not be less error-prone to reset lifo_enabled after calling task.run()? For example, right now you don't reset it in the Err(()) branch. That's probably fine because the worker thread shuts down there, but it is non-obvious.

Yeah, in the error case, the core is stolen. We cannot reset it either way. That said, it brings up a point that we need to reset it if it is stolen.

Maybe just reset before polling a task ...

Reset before polling a task isn't robust either because we can schedule tasks outside of the context of running a task. e.g. from polling the I/O driver.

Instead, what I suggest is we add debug_asserts around to ensure the value is correct.

tokio/src/runtime/tests/mod.rs

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` | | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` | --- ### Release Notes <details> <summary>tokio-rs/tokio (tokio)</summary> ### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1 [Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1) ##### Fixed - rt: fix nesting two `block_in_place` with a `block_on` between (#5837]) #5837]: tokio-rs/tokio#5837 ### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0 [Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0) Technically a breaking change, the `Send` implementation is removed from `runtime::EnterGuard`. This change fixes a bug and should not impact most users. ##### Breaking - rt: `EnterGuard` should not be `Send` (#5766]) ##### Fixed - fs: reduce blocking ops in `fs::read_dir` (#5653]) - rt: fix possible starvation (#5686], #5712]) - rt: fix stacked borrows issue in `JoinSet` (#5693]) - rt: panic if `EnterGuard` dropped incorrect order (#5772]) - time: do not overflow to signal value (#5710]) - fs: wait for in-flight ops before cloning `File` (#5803]) ##### Changed - rt: reduce time to poll tasks scheduled from outside the runtime (#5705], #5720]) ##### Added - net: add uds doc alias for unix sockets (#5659]) - rt: add metric for number of tasks (#5628]) - sync: implement more traits for channel errors (#5666]) - net: add nodelay methods on TcpSocket (#5672]) - sync: add `broadcast::Receiver::blocking_recv` (#5690]) - process: add `raw_arg` method to `Command` (#5704]) - io: support PRIORITY epoll events (#5566]) - task: add `JoinSet::poll_join_next` (#5721]) - net: add support for Redox OS (#5790]) ##### Unstable - rt: add the ability to dump task backtraces (#5608], #5676], #5708], #5717]) - rt: instrument task poll times with a histogram (#5685]) #5766]: tokio-rs/tokio#5766 #5653]: tokio-rs/tokio#5653 #5686]: tokio-rs/tokio#5686 #5712]: tokio-rs/tokio#5712 #5693]: tokio-rs/tokio#5693 #5772]: tokio-rs/tokio#5772 #5710]: tokio-rs/tokio#5710 #5803]: tokio-rs/tokio#5803 #5705]: tokio-rs/tokio#5705 #5720]: tokio-rs/tokio#5720 #5659]: tokio-rs/tokio#5659 #5628]: tokio-rs/tokio#5628 #5666]: tokio-rs/tokio#5666 #5672]: tokio-rs/tokio#5672 #5690]: tokio-rs/tokio#5690 #5704]: tokio-rs/tokio#5704 #5566]: tokio-rs/tokio#5566 #5721]: tokio-rs/tokio#5721 #5790]: tokio-rs/tokio#5790 #5608]: tokio-rs/tokio#5608 #5676]: tokio-rs/tokio#5676 #5708]: tokio-rs/tokio#5708 #5717]: tokio-rs/tokio#5717 #5685]: tokio-rs/tokio#5685 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>

carllerche added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime T-performance Topic: performance and benchmarks labels May 23, 2023

github-actions bot added the R-loom Run loom tests on this PR label May 23, 2023

Darksonn reviewed May 23, 2023

View reviewed changes

tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated Show resolved Hide resolved

carllerche added 2 commits May 23, 2023 10:21

remove consume_budget when scheduling from LIFO

4ff6296

rm unused method

0c05de8

Darksonn reviewed May 23, 2023

View reviewed changes

tokio/src/runtime/scheduler/multi_thread/worker.rs Show resolved Hide resolved

Noah-Kennedy reviewed May 23, 2023

View reviewed changes

tokio/src/runtime/scheduler/multi_thread/worker.rs Show resolved Hide resolved

hawkw self-requested a review May 23, 2023 18:02

hawkw reviewed May 23, 2023

View reviewed changes

tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated Show resolved Hide resolved

carllerche added 2 commits May 23, 2023 11:58

move LIFO enabled check to schedule_local

7722aee

add counters + include in CI

b6d37ea

Noah-Kennedy approved these changes May 23, 2023

View reviewed changes

Darksonn reviewed May 23, 2023

View reviewed changes

hawkw approved these changes May 23, 2023

View reviewed changes

carllerche added 4 commits May 23, 2023 12:55

reset lifo_enabled after stealing core`

61dbf13

add debug assertions

91e849f

fmt

57d550a

another assertion

5cc3716

Darksonn approved these changes May 23, 2023

View reviewed changes

enable debug assertions in loom tests

e4ae2f8

hawkw reviewed May 23, 2023

View reviewed changes

tokio/src/runtime/tests/mod.rs Outdated Show resolved Hide resolved

tweak

7701edc

carllerche merged commit 9eb3f5b into master May 23, 2023

carllerche deleted the cap-lifo-slot branch May 23, 2023 21:38

carllerche mentioned this pull request May 23, 2023

rt(threaded): basic self-tuning of injection queue #5715

Closed

carllerche mentioned this pull request May 24, 2023

rt(threaded): basic self-tuning of injection queue #5720

Merged

LukasKalbertodt mentioned this pull request Mar 4, 2024

Tobira seems to freeze randomly, not responding to HTTP requests at all anymore elan-ev/tobira#1129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt(threaded): cap LIFO slot polls #5712

rt(threaded): cap LIFO slot polls #5712

carllerche commented May 23, 2023

carllerche commented May 23, 2023

Noah-Kennedy commented May 23, 2023

hawkw left a comment

Darksonn May 23, 2023

carllerche May 23, 2023

Darksonn May 23, 2023

carllerche May 23, 2023

carllerche May 23, 2023

rt(threaded): cap LIFO slot polls #5712

rt(threaded): cap LIFO slot polls #5712

Conversation

carllerche commented May 23, 2023

Benchmarks

carllerche commented May 23, 2023

Noah-Kennedy commented May 23, 2023

hawkw left a comment

Choose a reason for hiding this comment

Darksonn May 23, 2023

Choose a reason for hiding this comment

carllerche May 23, 2023

Choose a reason for hiding this comment

Darksonn May 23, 2023

Choose a reason for hiding this comment

carllerche May 23, 2023

Choose a reason for hiding this comment

carllerche May 23, 2023

Choose a reason for hiding this comment