Fix for flaky worker_steal_count test #6932

jofas · 2024-10-23T10:22:02Z

Fixes #6470. Same approach as in #6916 for a similar flaky test that also relies on blocking the runtime. I tried corroborating the validity of this fix by running this test ~250k times successfully on my local machine, but I was unable to hit even a single retry in those runs.

Darksonn · 2024-10-23T10:58:11Z

tokio/tests/rt_unstable_metrics.rs

    let n: u64 = (0..metrics.num_workers())
        .map(|i| metrics.worker_steal_count(i))
        .sum();


We should recreate the runtime on failure. Otherwise a failed round could result in this number being wrong.

Ah, I guess this clould be the case for #6916 too.

Ah, I guess this clould be the case for #6916 too.

I don't think the global_queue_depth in #6916 could be tainted by a task from the previous try to block the runtime. That would require both tasks from the next try to be scheduled before the one unscheduled (but already queued) task of the previous try, which I don't believe can happen? If that could happen, the test would be racy. I'm not exactly sure what guarantees Tokio has for task execution order. Even if my assumption is true, if we are relying on an implementation detail, I think we should recreate the runtime in that test, too, to bullet proof it against future changes.

If the deadlock scenario is what I mentioned here, then one injection queue task will remain in the queue. So I think starting another iteration without refreshing the runtime could taint the queue.

So I think starting another iteration without refreshing the runtime could taint the queue.

That would require both tasks from the current try to block the runtime to be executed before the one remaining task from the previous try. Is that possible under the consideration that a task blocks the worker it is executed on?

(Sorry for my ignorance of the implementation details, I've yet to fully grasp the execution model of the multi-threaded runtime)

Yes, but it's probably pretty unlikely.

Okay, then I should definitely revisit #6916 and recreate the runtime on each try. Should I make a new PR or do the changes here?

Up to you. I would probably make a new PR.

#6936 for reference.

Darksonn

Thanks.

Darksonn reviewed Oct 23, 2024

View reviewed changes

jofas force-pushed the fix-flaky-worker-steal-count-test branch from a47cc64 to a06fa00 Compare October 24, 2024 13:50

jofas mentioned this pull request Oct 25, 2024

Removed race condition from global_queue_depth_multi_thread test #6936

Merged

Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Oct 25, 2024

Fixed flaky worker_steal_count test

f44896c

jofas force-pushed the fix-flaky-worker-steal-count-test branch from a06fa00 to f44896c Compare October 27, 2024 14:23

Darksonn approved these changes Oct 28, 2024

View reviewed changes

Darksonn merged commit 4468f27 into tokio-rs:master Oct 28, 2024
81 checks passed

jofas deleted the fix-flaky-worker-steal-count-test branch October 28, 2024 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for flaky worker_steal_count test #6932

Fix for flaky worker_steal_count test #6932

jofas commented Oct 23, 2024

Darksonn Oct 23, 2024

mox692 Oct 23, 2024

jofas Oct 23, 2024 •

edited

Loading

mox692 Oct 23, 2024

jofas Oct 24, 2024

Darksonn Oct 24, 2024

jofas Oct 24, 2024

Darksonn Oct 25, 2024

jofas Oct 25, 2024

Darksonn left a comment

Fix for flaky worker_steal_count test #6932

Fix for flaky worker_steal_count test #6932

Conversation

jofas commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jofas Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darksonn left a comment

Choose a reason for hiding this comment

jofas Oct 23, 2024 •

edited

Loading