-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle edge cases between queued
and no-worker
#7259
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1511,10 +1511,15 @@ class SchedulerState: | |
#: All tasks currently known to the scheduler | ||
tasks: dict[str, TaskState] | ||
|
||
#: Tasks in the "queued" state, ordered by priority | ||
#: Tasks in the "queued" state, ordered by priority. | ||
#: They should generally be root-ish, but in certain cases may not be. | ||
#: They must not have restrictions. | ||
#: Always empty if `worker-saturation` is set to `inf`. | ||
queued: HeapSet[TaskState] | ||
|
||
#: Tasks in the "no-worker" state | ||
#: Tasks in the "no-worker" state. | ||
#: They may or may not have restrictions. | ||
#: Could contain root-ish tasks even when `worker-saturation` is a finite value. | ||
unrunnable: set[TaskState] | ||
|
||
#: Subset of tasks that exist in memory on more than one worker | ||
|
@@ -2014,11 +2019,29 @@ def transition_no_worker_processing(self, key, stimulus_id): | |
assert not ts.actor, f"Actors can't be in `no-worker`: {ts}" | ||
assert ts in self.unrunnable | ||
|
||
if ws := self.decide_worker_non_rootish(ts): | ||
decide_worker = ( | ||
self.decide_worker_rootish_queuing_disabled | ||
if self.is_rootish(ts) | ||
else self.decide_worker_non_rootish | ||
) | ||
# NOTE: it's possible that queuing is enabled and `is_rootish(ts)`, | ||
# meaning this task should have been queued and `decide_worker_rootish_queuing_enabled` | ||
# would be the most appropriate function to use. But if, at submission time, | ||
# it didn't look root-ish (`TaskGroup` too small, or cluster too big) and there were | ||
# no running workers, it would have gone to `no-worker` instead. | ||
Comment on lines
+2029
to
+2031
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To my understanding this should be impossible. If you add workers you can flip a task from rootish to non-rootish, but a rootish task would not be in no-worker to begin with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see the tests I've added covering these cases.
If there are no running workers. Paused or retiring workers will contribute to
|
||
# Rather than implementing some `no-worker->queued` transition, we | ||
# just live with our original assessment and treat it as though queuing were disabled. | ||
# If we used `decide_worker_rootish_queuing_enabled` here, it's possible that no workers | ||
# are idle, which would leave it in `unrunnable` and cause a deadlock. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this deadlock case is covered by If you apply this diff (using diff --git a/distributed/distributed.yaml b/distributed/distributed.yaml
index 105a45e9..f1b966b5 100644
--- a/distributed/distributed.yaml
+++ b/distributed/distributed.yaml
@@ -22,7 +22,7 @@ distributed:
events-log-length: 100000
work-stealing: True # workers should steal tasks from each other
work-stealing-interval: 100ms # Callback time for work stealing
- worker-saturation: .inf # Send this fraction of nthreads root tasks to workers
+ worker-saturation: 1.0 # Send this fraction of nthreads root tasks to workers
worker-ttl: "5 minutes" # like '60s'. Time to live for workers. They must heartbeat faster than this
pickle: True # Is the scheduler allowed to deserialize arbitrary bytestrings
preload: [] # Run custom modules with Scheduler
diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index 87ffce4e..5400c50a 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -2020,9 +2020,13 @@ class SchedulerState:
assert ts in self.unrunnable
decide_worker = (
- self.decide_worker_rootish_queuing_disabled
+ (
+ partial(self.decide_worker_rootish_queuing_disabled, ts)
+ if math.isinf(self.WORKER_SATURATION)
+ else self.decide_worker_rootish_queuing_enabled
+ )
if self.is_rootish(ts)
- else self.decide_worker_non_rootish
+ else partial(self.decide_worker_non_rootish, ts)
)
# NOTE: it's possible that queuing is enabled and `is_rootish(ts)`,
# meaning this task should have been queued and `decide_worker_rootish_queuing_enabled`
@@ -2034,13 +2038,13 @@ class SchedulerState:
# If we used `decide_worker_rootish_queuing_enabled` here, it's possible that no workers
# are idle, which would leave it in `unrunnable` and cause a deadlock.
- if ws := decide_worker(ts):
+ if ws := decide_worker():
self.unrunnable.discard(ts)
worker_msgs = _add_to_processing(self, ts, ws)
# If no worker, task just stays in `no-worker`
- if self.validate and self.is_rootish(ts):
- assert ws is not None
+ # if self.validate and self.is_rootish(ts):
+ # assert ws is not None
return recommendations, client_msgs, worker_msgs
except Exception as e: |
||
|
||
if ws := decide_worker(ts): | ||
self.unrunnable.discard(ts) | ||
worker_msgs = _add_to_processing(self, ts, ws) | ||
# If no worker, task just stays in `no-worker` | ||
|
||
if self.validate and self.is_rootish(ts): | ||
assert ws is not None | ||
|
||
return recommendations, client_msgs, worker_msgs | ||
except Exception as e: | ||
logger.exception(e) | ||
|
@@ -2051,9 +2074,10 @@ def decide_worker_rootish_queuing_disabled( | |
returns None, in which case the task should be transitioned to | ||
``no-worker``. | ||
""" | ||
if self.validate: | ||
# See root-ish-ness note below in `decide_worker_rootish_queuing_enabled` | ||
assert math.isinf(self.WORKER_SATURATION) | ||
# NOTE: in rare cases, it's possible queuing is actually enabled here (see | ||
# `transition_no_worker_processing`). | ||
# It's also possible that `is_rootish(ts)` is False (see note below in | ||
# `decide_worker_rootish_queuing_enabled`) | ||
|
||
pool = self.idle.values() if self.idle else self.running | ||
if not pool: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the biggest problems I have right now with the queuing/rootish scheduling is that we have three different
decide_*
functions. From an (internal) API perspective, I don't want to have the burden of making the correct decision about which one of these APIs to call in which circumstance. I just want to call a singledecide_worker
, provide it with sufficient context and it should return the proper worker. Wouldn't this already avoid the problem?Naively I would expect that this new decide_worker would look approximately like the block in
transition_waiting_processing
distributed/distributed/scheduler.py
Lines 2230 to 2242 in 17156e9
Isn't this always the correct logic when deciding on a worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and this is how I'd originally implemented it in #6614. But based on your feedback (which I agree with) we split it into multiple
decide_worker_*
functions for different cases.The reason we didn't wrap the three
decide_worker_*
cases into one overalldecide_worker
function, which always "does the right thing", is that the recommendation you make—no-worker
vsqueued
—changes depending on which function you use.So then this
decide_worker
function would have to take and mutate arecommendations
dict, or at least return somehow what recommendation to make. I thought we'd decided this was a pattern we wanted to avoid.Moreover, we'd then have to implement a
no-worker->queued
andqueued->no-worker
transition. That's not hard, just more complexity. If we don't do #7262, it's maybe the right thing to do instead of this PR.