-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work stealing cpu load with high numbers of tasks #6178
Comments
I can't see anything that's not O(1) at first glance. We changed one thing about https://github.com/dask/distributed/blame/main/distributed/stealing.py#L221-L223 Note: this is the only place where we're abusing the stimulus_id for anything functional
I opened #6179 to remove the UUID |
I think that we should just remember to run a larger workload and track
profiling here. It also might be O(1) in number of tasks, but O(n) in
number of workers, for example.
…On Fri, Apr 22, 2022 at 12:52 PM Florian Jetter ***@***.***> wrote:
I can't see anything that's not O(1) at first glance. We changed one thing
about move_task_request recently that slows it down but is constant time.
We introduced a UUID generation in there
https://github.com/dask/distributed/blame/main/distributed/stealing.py#L221-L223
Note: this is the *only* place where we're abusing the stimulus_id for
anything functional
- #5379 <#5379> (this was the
deadlock)
- #5620 <#5620> (here we
introduced the UUID since the simple time caused duplicates)
I opened #6179 <#6179> to remove
the UUID
—
Reply to this email directly, view it on GitHub
<#6178 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTF6YIMRB76UTS7KAIDVGLRO5ANCNFSM5UC4YLKQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If there's a reproducible workload, |
I was speaking with @bnaul . He mentioned that he noticed the scheduler at 100% CPU usage when it should have been in a somewhat quiet state. He had on the order of 100,000 tasks and it was spending all of its time in work stealing, in particular the maybe_move_task function. Something there may not be O(1).
The text was updated successfully, but these errors were encountered: