Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: add worker_park_unpark_count #6696

Merged
merged 5 commits into from
Jul 23, 2024

Conversation

surban
Copy link
Contributor

@surban surban commented Jul 19, 2024

This counts the number of times a worker was parked and unparked. Thus it is odd if the worker is parked and even if the worker is unparked.

Motivation

See #6353 and discussion in #6370.

Solution

A lightweight watchdog can watch the worker statistics and determine that a worker is stuck if its park/unpark count is even (thus it is active) and its poll count does not increase.

This counts the number of times a worker was
parked and unparked. Thus it is odd if the worker
is parked and even if the worker is unparked.
@github-actions github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Jul 19, 2024
@mox692 mox692 added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime M-metrics Module: tokio/runtime/metrics labels Jul 20, 2024
let rt = current_thread();
let metrics = rt.metrics();
rt.block_on(async {
time::sleep(Duration::from_millis(1)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if you use yield_now instead of sleeping here? We generally try to avoid adding sleeps in tests, as they make the test suite take much longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current thread runtime it only works with sleep. For the multi-threaded runtime I changed it to yield_now.

Copy link
Contributor

@Darksonn Darksonn Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What!? That is very surprising to me. If yield_now works on the multi-thread runtime, then it probaby also works if you get rid of the entire block_on call. In the multi-thread runtime case, the yield_now doesn't interact with the worker threads at all because it's in a block_on and not spawned.

I'm guessing that just spawning the threaded() runtime is enough for both threads to get to a park/unpark count of two. To actually test that our task changes the count, I think we can do something along these lines:

First wait for the count to reach 2 on both workers using a loop like this one. Then, update the code to spawn tasks instead so that the yield_now actually runs on the worker threads:

rt.block_on(rt.spawn(async {}));
drop(rt);
assert!(4 <= metrics.worker_park_unpark_count(0) || 4 <= metrics.worker_park_unpark_count(1));

Here, the count already reached two before spawning, so spawning should result in a worker waking up to process the task, and the worker then goes back to sleep. Hence, one worker should now be at 4.


You probably need to use rt.spawn with the current-thread runtime too, but I don't think you need the loop in that case as there is no threading involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multithreaded case after launch the park/unpark count is 1 for both workers. This makes sense since they get parked after startup, because there is no work to do.

After spawning the task, the park/unpark count can be 1/3 or 3/3. I don't know why the scheduler sometimes unparks both threads.

After shutdown the count is 4/4 or 2/4. This makes sense because the threads are both unparked for shutdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It makes more sense now.

tokio/tests/rt_unstable_metrics.rs Outdated Show resolved Hide resolved
tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved
@surban surban requested a review from Darksonn July 22, 2024 15:40
Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@Darksonn Darksonn merged commit b69f16a into tokio-rs:master Jul 23, 2024
79 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics M-runtime Module: tokio/runtime R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants