Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurations for rootish taskgroup threshold #8898

Merged
merged 5 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions distributed/distributed-schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,32 @@ properties:
generally leave `worker-saturation` at 1.0, though 1.25-1.5 could slightly improve
performance if ample memory is available.

rootish-taskgroup:
type:
- integer

description: |
Controls when a specific task group is identified as rootish when
worker saturation is set.

A task group is identifier as rootish if it has only up to a certain number
of dependencies (5 by default). This can be faulty for very large datasets
where the number of data tasks from xarray can be higher than 5.

Increasing this limit will capture these root tasks successfully but increase
the risk of misidentifying task groups as rootish, which can have
performance implications.

rootish-taskgroup-dependencies:
type:
- integer

description: |
Controls the number of transitive dependencies a task group can have to be considered rootish.
It checks the number of dependencies each dependency of a rootish task groups has.

The same caveats as for `rootish-taskgroup` apply.

worker-ttl:
type:
- string
Expand Down
2 changes: 2 additions & 0 deletions distributed/distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ distributed:
work-stealing: True # workers should steal tasks from each other
work-stealing-interval: 100ms # Callback time for work stealing
worker-saturation: 1.1 # Send this fraction of nthreads root tasks to workers
rootish-taskgroup: 5 # number of dependencies of a rootish tg
rootish-taskgroup-dependencies: 5 # number of dependencies of the dependencies of the rootish tg
worker-ttl: "5 minutes" # like '60s'. Time to live for workers. They must heartbeat faster than this
preload: [] # Run custom modules with Scheduler
preload-argv: [] # See https://docs.dask.org/en/latest/how-to/customize-initialization.html
Expand Down
11 changes: 9 additions & 2 deletions distributed/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1840,6 +1840,13 @@ def __init__(
+ repr(self.WORKER_SATURATION)
)

self.rootish_tg_threshold = dask.config.get(
"distributed.scheduler.rootish-taskgroup"
)
self.rootish_tg_dependencies_threshold = dask.config.get(
"distributed.scheduler.rootish-taskgroup-dependencies"
)

@abstractmethod
def log_event(self, topic: str | Collection[str], msg: Any) -> None: ...

Expand Down Expand Up @@ -3090,8 +3097,8 @@ def is_rootish(self, ts: TaskState) -> bool:
# TODO short-circuit to True if `not ts.dependencies`?
return (
len(tg) > self.total_nthreads * 2
and len(tg.dependencies) < 5
and sum(map(len, tg.dependencies)) < 5
and len(tg.dependencies) < self.rootish_tg_threshold
and sum(map(len, tg.dependencies)) < self.rootish_tg_dependencies_threshold
)

def check_idle_saturated(self, ws: WorkerState, occ: float = -1.0) -> None:
Expand Down
12 changes: 12 additions & 0 deletions distributed/tests/test_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -5277,3 +5277,15 @@ async def before_close(self):
assert s.plugins["before_close"].call_count == 1
lines = caplog.getvalue().split("\n")
assert sum("Closing scheduler" in line for line in lines) == 1


@gen_cluster(
client=True,
config={
"distributed.scheduler.rootish-taskgroup": 10,
"distributed.scheduler.rootish-taskgroup-dependencies": 15,
},
)
async def test_rootish_taskgroup_configuration(c, s, *workers):
assert s.rootish_tg_threshold == 10
assert s.rootish_tg_dependencies_threshold == 15
Loading