Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt(threaded): cap LIFO slot polls #5712

Merged
merged 11 commits into from
May 23, 2023
Merged

rt(threaded): cap LIFO slot polls #5712

merged 11 commits into from
May 23, 2023

Conversation

carllerche
Copy link
Member

As an optimization to improve locality, the multi-threaded scheduler maintains a single slot (LIFO slot). When a task is scheduled, it goes into the LIFO slot. The scheduler will run tasks in the LIFO slot first before checking the local queue.

Ping-ping style workloads where task A notifies task B, which notifies task A again, can cause starvation as these two tasks repeatedly schedule the other in the LIFO slot. #5686, a first attempt at solving this problem, consumes a unit of budget each time a task is scheduled from the LIFO slot. However, at the time of this commit, the scheduler allocates 128 units of budget for each chunk of work. This is relatively high in situations where tasks do not perform many async operations yet have meaningful poll times (even 5-10 microsecond poll times can have an outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is executing. However, as a stopgap, this commit limits the number of times the LIFO slot is prioritized per scheduler tick.

Benchmarks

In a benchmark crafted to simulate injecting tasks while the runtime is under load, this change sped things up 30x (68s -> 2s)

This is the benchmark I used to measure this change's improvements. I am still waiting to include it because, even with this change, it causes the scheduler benchmarks to run for a very long time.

    const STALL_DUR: Duration = Duration::from_micros(10);
    const NUM_SPAWN: usize = 1_000;
    static TICKS: AtomicUsize = AtomicUsize::new(0);

    let rt_handle = rt.handle();
    let mut handles = Vec::with_capacity(NUM_SPAWN);
    let stall = Arc::new(AtomicBool::new(true));

    // Spawn some tasks to keep the runtimes busy
    for _ in 0..NUM_WORKERS {
        let stall = stall.clone();
        fn iter(stall: Arc<AtomicBool>) {
            tokio::spawn(async {
                if stall.load(Relaxed) {
                    TICKS.fetch_add(1, Relaxed);
                    let now = Instant::now();

                    while now.elapsed() < STALL_DUR {}

                    iter(stall);
                }
            });
        }
        rt.spawn(async { iter(stall) });
    }

    for _ in 0..NUM_SPAWN {
        handles.push(rt_handle.spawn(async {}));
    }

    rt.block_on(async {
        for handle in handles.drain(..) {
            handle.await.unwrap();
        }
    });
    stall.store(false, Relaxed);
    println!(" TICKS = {}", TICKS.load(Relaxed));

As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first,
before checking the local queue.

In ping-ping style workloads where task A notifies task B, which
notifies task A again, this can cause starvation as these two tasks will
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is quite high in situation where tasks do not perform many
async operations, yet have meaningful poll times (even 5-10 microsecond
poll times can have outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the number of times
the LIFO slot is prioritized per scheduler tick.
@carllerche carllerche added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime T-performance Topic: performance and benchmarks labels May 23, 2023
@github-actions github-actions bot added the R-loom Run loom tests on this PR label May 23, 2023
@carllerche
Copy link
Member Author

All other scheduler benchmark results are unchanged (within the margin of error).

@Noah-Kennedy
Copy link
Contributor

Might not be a bad idea to add an internal highly unstable runtime metric for this.

@hawkw hawkw self-requested a review May 23, 2023 18:02
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! I agree with @Noah-Kennedy's suggestion of adding a new unstable runtime metric to track this behavior, though.

tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated Show resolved Hide resolved
Comment on lines +542 to +546
// Run the LIFO task, then loop
core.metrics.start_poll();
*self.core.borrow_mut() = Some(core);
let task = self.worker.handle.shared.owned.assert_owner(task);
task.run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be less error-prone to reset lifo_enabled after calling task.run()? For example, right now you don't reset it in the Err(()) branch. That's probably fine because the worker thread shuts down there, but it is non-obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in the error case, the core is stolen. We cannot reset it either way. That said, it brings up a point that we need to reset it if it is stolen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just reset before polling a task ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset before polling a task isn't robust either because we can schedule tasks outside of the context of running a task. e.g. from polling the I/O driver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, what I suggest is we add debug_asserts around to ensure the value is correct.

@carllerche carllerche merged commit 9eb3f5b into master May 23, 2023
@carllerche carllerche deleted the cap-lifo-slot branch May 23, 2023 21:38
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Jul 6, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` |
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` |

---

### Release Notes

<details>
<summary>tokio-rs/tokio (tokio)</summary>

### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1

[Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1)

##### Fixed

-   rt: fix nesting two `block_in_place` with a `block_on` between (#&#8203;5837])

#&#8203;5837]: tokio-rs/tokio#5837

### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0

[Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0)

Technically a breaking change, the `Send` implementation is removed from
`runtime::EnterGuard`. This change fixes a bug and should not impact most users.

##### Breaking

-   rt: `EnterGuard` should not be `Send` (#&#8203;5766])

##### Fixed

-   fs: reduce blocking ops in `fs::read_dir` (#&#8203;5653])
-   rt: fix possible starvation (#&#8203;5686], #&#8203;5712])
-   rt: fix stacked borrows issue in `JoinSet` (#&#8203;5693])
-   rt: panic if `EnterGuard` dropped incorrect order (#&#8203;5772])
-   time: do not overflow to signal value (#&#8203;5710])
-   fs: wait for in-flight ops before cloning `File` (#&#8203;5803])

##### Changed

-   rt: reduce time to poll tasks scheduled from outside the runtime (#&#8203;5705], #&#8203;5720])

##### Added

-   net: add uds doc alias for unix sockets (#&#8203;5659])
-   rt: add metric for number of tasks (#&#8203;5628])
-   sync: implement more traits for channel errors (#&#8203;5666])
-   net: add nodelay methods on TcpSocket (#&#8203;5672])
-   sync: add `broadcast::Receiver::blocking_recv` (#&#8203;5690])
-   process: add `raw_arg` method to `Command` (#&#8203;5704])
-   io: support PRIORITY epoll events (#&#8203;5566])
-   task: add `JoinSet::poll_join_next` (#&#8203;5721])
-   net: add support for Redox OS (#&#8203;5790])

##### Unstable

-   rt: add the ability to dump task backtraces (#&#8203;5608], #&#8203;5676], #&#8203;5708], #&#8203;5717])
-   rt: instrument task poll times with a histogram (#&#8203;5685])

#&#8203;5766]: tokio-rs/tokio#5766

#&#8203;5653]: tokio-rs/tokio#5653

#&#8203;5686]: tokio-rs/tokio#5686

#&#8203;5712]: tokio-rs/tokio#5712

#&#8203;5693]: tokio-rs/tokio#5693

#&#8203;5772]: tokio-rs/tokio#5772

#&#8203;5710]: tokio-rs/tokio#5710

#&#8203;5803]: tokio-rs/tokio#5803

#&#8203;5705]: tokio-rs/tokio#5705

#&#8203;5720]: tokio-rs/tokio#5720

#&#8203;5659]: tokio-rs/tokio#5659

#&#8203;5628]: tokio-rs/tokio#5628

#&#8203;5666]: tokio-rs/tokio#5666

#&#8203;5672]: tokio-rs/tokio#5672

#&#8203;5690]: tokio-rs/tokio#5690

#&#8203;5704]: tokio-rs/tokio#5704

#&#8203;5566]: tokio-rs/tokio#5566

#&#8203;5721]: tokio-rs/tokio#5721

#&#8203;5790]: tokio-rs/tokio#5790

#&#8203;5608]: tokio-rs/tokio#5608

#&#8203;5676]: tokio-rs/tokio#5676

#&#8203;5708]: tokio-rs/tokio#5708

#&#8203;5717]: tokio-rs/tokio#5717

#&#8203;5685]: tokio-rs/tokio#5685

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958
Reviewed-by: crapStone <crapstone01@gmail.com>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime R-loom Run loom tests on this PR T-performance Topic: performance and benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants