Queuing: lowest-memory worker as tiebreaker #7248

gjoseph92 · 2022-11-03T05:11:19Z

This could be another way of approaching #7197 in a practical sense. AFAIU the point of the round-robin behavior is that maybe, if you keep re-using the same worker, its memory will slowly creep up (if you're running a task that leaks memory or something)? So if we just use memory as a tie-breaker, you'd probably get round-robin in practice on an idle cluster.

Would need to add a simple test.

Closes #7197

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2022-11-03T07:29:11Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 6h 29m 15s ⏱️ +26s
  3 171 tests +  3   3 085 ✔️ +2   83 💤 ±  0   3 ❌ +1
23 464 runs +24 22 544 ✔️ +8 910 💤 +10 10 ❌ +6

For more details on these failures, see this check.

Results for commit 6f0bbc5. ± Comparison against base commit 02b9430.

♻️ This comment has been updated with latest results.

gjoseph92 · 2022-11-04T23:49:05Z

I've tried to add a test here but I'm having a hard time getting it to work. (The test depends on #7221 BTW.)

@crusaderky I'm trying to use optimistic memory as the tiebreaker here, but it doesn't seem like the right metric. What do you think about this?

The problem is that when we've received task-finished from a worker, but not another heartbeat yet, optimistic memory may not include the nbytes we know are on the worker. So we will keep picking that worker (we picked it in the first place since its memory was the lowest) until the next heartbeat.

I think I should probably just change this to nbytes instead of trying to use the more complex memory measures—using high-latency metrics-based measures for scheduling doesn't feel quite right anyway.

I was just hoping to be able to make it so that in an idle cluster, if there's a worker with high unmanaged memory use, we wouldn't pick it, even if other workers had a small amount of managed memory. I think this would get at the original intent of #4637.

crusaderky · 2022-11-07T14:52:29Z

AFAIU the point of the round-robin behavior is that maybe, if you keep re-using the same worker, its memory will slowly creep up (if you're running a task that leaks memory or something)?

From what I can read, the round-robin algorithm prevents managed memory from piling up on a single worker.
e.g.

futures = []
for i in range(100):
    fut = c.submit(numpy.random, 2**24, key=f"x{i}")  # 128 MiB
    wait(fut)
    futures.append(fut)

Without round-robin, all 100 tasks would complete on the same worker and stay there.

The problem is that when we've received task-finished from a worker, but not another heartbeat yet, optimistic memory may not include the nbytes we know are on the worker. So we will keep picking that worker (we picked it in the first place since its memory was the lowest) until the next heartbeat.

Correct, all memory measures except managed come from the heartbeat and as such suffer delays and are not a good choice for decisions that rely on split-second updates to the cluster state. When you complete a task, managed will go up, but process will remain the same until the next heartbeat. This will cause the size of the task that just completed to be temporarily detracted from unmanaged_recent. This behaviour is correct for slow-ish task, and not so much for fast ones:

e.g. this task will produce very accurate readings on the scheduler:

def f():
    a = numpy.zeros(2**24)
    sleep(5)
    return a

It will be accounted as 128mb of unmanaged_recent between the first hearbeat after it started and the moment the scheduler receives {op: task-finished}
when the message arrive, it will be accounted for 128mb of managed memory and 0 unmanaged_recent
when the heartbeat arrives several seconds later, it will confirm what the scheduler already knows

By contrast, a task that is much faster to complete than the heartbeat will produce misleading readings in the short term:

def f():
    return numpy.zeros(2**24)

a heartbeat will likely never capture the task while it's running
when {op: task-finished} reaches the scheduler, the task will be accounted for +128mb managed and -128mb unamanged_recent (all subtotals are floored to 0)
when the heartbeat finally arrives, it will produce the correct numbers.

As you see there isn't a way that is right for all, and I don't think calling psutil every time a task finishes would be a good idea, performance-wise.

I think I should probably just change this to nbytes

This sounds like a good choice to me - better than the current round-robin, in fact.

This reverts commit ebb28d0.

gjoseph92 · 2022-11-07T19:01:21Z

Changed to just use ws.nbytes.

Tests will still fail until #7221.

test_decide_worker_memory_tiebreaker_idle_heterogeneous_cluster is skipped when queuing is disabled. I propose we discuss that separately in #7266.

gjoseph92 · 2022-11-09T03:53:30Z

Realized that #7221 accidentally slipped in here. With that removed, test_decide_worker_memory_tiebreaker_idle_cluster fails both here and on main, queuing on or off, because of #7274 and #7197 (comment).

Queuing: lowest-memory worker as tiebreaker

a79190a

gjoseph92 mentioned this pull request Nov 3, 2022

Round-robin empty workers with queueing enabled #7222

Closed

2 tasks

This was referenced Nov 3, 2022

All tasks without dependencies are root-ish #7221

Open

Turn on queuing by default #7213

Closed

WIP add test [skip ci]

ebb28d0

gjoseph92 added 2 commits November 7, 2022 11:23

Revert "WIP add test [skip ci]"

00ef6d8

This reverts commit ebb28d0.

just use nbytes

b3b589c

gjoseph92 mentioned this pull request Nov 7, 2022

Pick worker with lowest memory use by percentage, not absolute #7266

Open

gjoseph92 marked this pull request as ready for review November 7, 2022 19:11

remove leaked no-deps-rootish

6f0bbc5

gjoseph92 mentioned this pull request Nov 9, 2022

Round-robin worker selection makes poor choices with worker-saturation > 1.0 #7197

Open

gjoseph92 mentioned this pull request Nov 9, 2022

Consistent worker selection for no-deps cases #7280

Open

2 tasks

gjoseph92 added a commit to gjoseph92/distributed that referenced this pull request Nov 10, 2022

Add memory-tiebreaker tests from dask#7248

fa1af5c

gjoseph92 requested a review from fjetter as a code owner January 23, 2024 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queuing: lowest-memory worker as tiebreaker #7248

Queuing: lowest-memory worker as tiebreaker #7248

gjoseph92 commented Nov 3, 2022

github-actions bot commented Nov 3, 2022 •

edited

Loading

gjoseph92 commented Nov 4, 2022

crusaderky commented Nov 7, 2022 •

edited

Loading

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 9, 2022

Queuing: lowest-memory worker as tiebreaker #7248

Are you sure you want to change the base?

Queuing: lowest-memory worker as tiebreaker #7248

Conversation

gjoseph92 commented Nov 3, 2022

github-actions bot commented Nov 3, 2022 • edited Loading

Unit Test Results

gjoseph92 commented Nov 4, 2022

crusaderky commented Nov 7, 2022 • edited Loading

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 9, 2022

github-actions bot commented Nov 3, 2022 •

edited

Loading

crusaderky commented Nov 7, 2022 •

edited

Loading