Pick worker with lowest memory use by percentage, not absolute #7266

gjoseph92 · 2022-11-07T18:58:40Z

Currently the worker_objective function uses worker managed memory as a tiebreaker if it looks like a task will start in the same amount of time on multiple workers:

distributed/distributed/scheduler.py

Line 3236 in 00bf8ed

return (start_time, ws.nbytes)

In a heterogeneous cluster, this means we might pick a small worker with less memory available instead of a large worker with lots of memory available, but more total data in memory.

Maybe we should compare by percentage of memory used, rather than total bytes used:

diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index eb5828bf..5325af4b 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -3233,7 +3233,7 @@ class SchedulerState:
         if ts.actor:
             return (len(ws.actors), start_time, ws.nbytes)
         else:
-            return (start_time, ws.nbytes)
+            return (start_time, ws.nbytes / ws.memory_limit)
 
     def add_replica(self, ts: TaskState, ws: WorkerState):
         """Note that a worker holds a replica of a task with state='memory'"""

#7248 does this for root tasks when queuing is enabled. I think it would make sense to do in all cases though.

cc @fjetter @crusaderky

The text was updated successfully, but these errors were encountered:

crusaderky · 2022-11-08T12:29:05Z

This boils down to why would people use a heterogeneous cluster.
From my personal experience the answers are chiefly two:

some, but not all, hosts have specific resources (GPU, local services, etc.)
some tasks have much higher heap and/or input and/or output memory usage

In the second case, you'll likely have a wealth of tasks with average memory usage and without any restrictions, plus a handful of tasks with a RAM or HI_MEM or whatever resource restriction. In this use case, you do want to leave more memory free on the hosts mounting more RAM, because it will be needed by the resource-restricted tasks.

FYI - AMM ReduceReplicas, graceful worker retirement, rebalance, and the future AMM Rebalance all use absolute optimistic memory as a metric (I'm fine with scheduling using managed memory instead, as it's a lot more responsive).

gjoseph92 · 2022-11-09T04:07:43Z

That use-case makes sense. However, I just find it hard to justify picking worker A in this situation:

worker A: 400MiB in memory, 300 MiB remaining
worker B: 1 GiB in memory, 2 GiB remaining

To me, using absolute memory is making too much of an assumption that someone's use-case and intent looks like the one you've described. But you might just have heterogeneous workers because that's what you got, whether it's the machines you had around in your lab, or the instances Coiled gave you because you allowed a range of instance types for faster cluster startup time.

I feel like the safest generic choice to make is the one that's the least likely to put a worker under memory pressure. If someone has high-memory workers, but they want to save them for particular tasks, then you can always use more resource restrictions to accomplish that.

As a user, I'd be confused if my low-memory workers kept getting overloaded and dying but my high-memory workers stayed nearly empty, unless I'd explicitly used restrictions to make this happen.

gjoseph92 added the scheduling label Nov 7, 2022

gjoseph92 mentioned this issue Nov 7, 2022

Queuing: lowest-memory worker as tiebreaker #7248

Open

2 tasks

gjoseph92 mentioned this issue Nov 10, 2022

Consistent worker selection for no-deps cases #7280

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pick worker with lowest memory use by percentage, not absolute #7266

Pick worker with lowest memory use by percentage, not absolute #7266

gjoseph92 commented Nov 7, 2022

crusaderky commented Nov 8, 2022 •

edited

Loading

gjoseph92 commented Nov 9, 2022

Pick worker with lowest memory use by percentage, not absolute #7266

Pick worker with lowest memory use by percentage, not absolute #7266

Comments

gjoseph92 commented Nov 7, 2022

crusaderky commented Nov 8, 2022 • edited Loading

gjoseph92 commented Nov 9, 2022

crusaderky commented Nov 8, 2022 •

edited

Loading