Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dask to run tasks #1714

Closed
wants to merge 31 commits into from
Closed

Use dask to run tasks #1714

wants to merge 31 commits into from

Conversation

bouweandela
Copy link
Member

@bouweandela bouweandela commented Sep 5, 2022

Description

Use dask distributed for data computations.

Works with the development version iris.

Use with ESMValGroup/ESMValTool#3151 to pass the cluster address on to Python diagnostics so they make use of the Dask cluster to run their array computations.

Example usage

esmvaltool run examples/recipe_python.yml --max-parallel-tasks=0 --dask='{"cluster": {"type": "dask_jobqueue.SLURMCluster", "n_workers": 2}}'

this requires installing dask_jobqueue, or locally

esmvaltool run examples/recipe_python.yml --dask='{"client": {}}"

Configuration

The settings max_parallel_tasks can be used to control the how tasks are run:

max_parallel_tasks effect
0 create tasks and compute tasks on the configured Dask cluster
1 create tasks from the esmvaltool Python process on the local machine, compute tasks on the configured Dask cluster
null or >1 create tasks from this number of processes (the number of CPU cores when null) on the local machine, compute tasks on the configured Dask cluster

The dask: cluster setting can be used to configure the Dask cluster, the argument type can be used to specify the Python cluster class (defaults to distributed.LocalCluster), any other arguments will be passed to the class.

The dask: client setting can be used to configure the distributed.Client, any arguments will be passed on.

Example config-user.yml settings for running locally using a LocalCluster:

dask:
  cluster:
    type: distributed.LocalCluster

Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)

dask:
  client:
    address: tcp://127.0.0.1:45695

Example settings for running on Levante:

dask:
  client: {}
  cluster:
    type: dask_jobqueue.SLURMCluster
    queue: interactive
    account: bk1088
    cores: 8
    memory: 16GiB
    local_directory: "/work/bd0854/b381141/dask-tmp"
    n_workers: 2

Example for recovering the previous behaviour: specify no dask item in config-user.yml.

TO DO

  • Record info/debug log messages when creating tasks from the cluster
  • Fix --resume-from with delayed save
  • Investigate why there is a warning about sending a large graph and solve the issue
  • Write documentation
  • Make dask configuration more user friendly

May close #430

Link to documentation:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@bouweandela
Copy link
Member Author

Note that this is currently slow due to SciTools/iris#4957. Workaround available in #1713.

@codecov
Copy link

codecov bot commented Sep 23, 2022

Codecov Report

Merging #1714 (6bf6328) into main (664f7c9) will decrease coverage by 0.49%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main    #1714      +/-   ##
==========================================
- Coverage   92.78%   92.29%   -0.49%     
==========================================
  Files         236      236              
  Lines       12455    12573     +118     
==========================================
+ Hits        11556    11604      +48     
- Misses        899      969      +70     
Impacted Files Coverage Δ
esmvalcore/config/_config_validators.py 96.93% <ø> (ø)
esmvalcore/_task.py 64.56% <39.81%> (-7.06%) ⬇️
esmvalcore/preprocessor/_io.py 85.93% <70.00%> (-0.76%) ⬇️
esmvalcore/preprocessor/__init__.py 94.75% <87.50%> (-0.49%) ⬇️
esmvalcore/_provenance.py 97.67% <100.00%> (+0.01%) ⬆️
esmvalcore/_recipe/recipe.py 99.02% <100.00%> (+<0.01%) ⬆️

@bouweandela
Copy link
Member Author

@fnattino This should now work with the latest commit too. Please let me know if you encounter any problems.

esmvalcore/_task.py Outdated Show resolved Hide resolved
@bouweandela
Copy link
Member Author

Closing this for the reasons mentioned in #2041. See #2316 for a new attempt.

@bouweandela bouweandela deleted the dask-distributed branch February 12, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CMIP3 processing does not work in parallel mode
2 participants