Dask Tutorial

This tutorial was last given at SciPy 2022 in Austin Texas. A video of the SciPy 2022 tutorial is available online.

Dask is a parallel and distributed computing library that scales the existing Python and PyData ecosystem. Dask can scale up to your full laptop capacity and out to a cloud cluster.

Prepare

1. You should clone this repository

git clone http://github.com/dask/dask-tutorial

and then install necessary packages. There are three different ways to achieve this, pick the one that best suits you, and only pick one option. They are, in order of preference:

2a) Create a conda environment (preferred)

In the main repo directory

conda env create -f binder/environment.yml
conda activate dask-tutorial

2b) Install into an existing environment

You will need the following core libraries

conda install -c conda-forge ipycytoscape jupyterlab python-graphviz matplotlib zarr xarray pooch pyarrow s3fs scipy dask distributed dask-labextension

Note that these options will alter your existing environment, potentially changing the versions of packages you already have installed.

2c) Use Dockerfile

You can build a docker image from the provided Dockerfile.

$ docker build . # This will build using the same env as in a)

Run a container, replacing the ID with the output of the previous command

$ docker run -it -p 8888:8888 -p 8787:8787 <container_id_or_tag>

The above command will give an URL (Like http://(container_id or 127.0.0.1):8888/?token=<sometoken>) which can be used to access the notebook from browser. You may need to replace the given hostname with "localhost" or "127.0.0.1".

You should follow only one of the options above!

Launch Jupyter

From the repo directory

jupyter lab

This was already done for method c) and does not need repeating.

You are welcome to use Jupyter notebook if you prefer, but we'll be using lab in the live tutorial.

Links

Reference
- Docs
- Examples
- Code
- Blog
Ask for help
- dask tag on Stack Overflow, for usage questions
- github issues for bug reports and feature requests
- discourse forum for general, non-bug, questions and discussion
- Attend a live tutorial

Outline

Overview - dask's place in the universe.
Dataframe - parallelized operations on many pandas dataframes spread across your cluster.
Array - blocked numpy-like functionality with a collection of numpy arrays spread across your cluster.
Delayed - the single-function way to parallelize general python code.
Deployment/Distributed - Dask's scheduler for clusters, with details of how to view the UI.
Distributed Futures - non-blocking results that compute asynchronously.
Conclusion

Name		Name	Last commit message	Last commit date
Latest commit History 406 Commits
.dask		.dask
.github/workflows		.github/workflows
binder		binder
data		data
images		images
.gitignore		.gitignore
00_overview.ipynb		00_overview.ipynb
01_dataframe.ipynb		01_dataframe.ipynb
02_array.ipynb		02_array.ipynb
03_dask.delayed.ipynb		03_dask.delayed.ipynb
04_distributed.ipynb		04_distributed.ipynb
05_futures.ipynb		05_futures.ipynb
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
conf.py		conf.py
github_deploy_key_dask_dask_tutorial.enc		github_deploy_key_dask_dask_tutorial.enc
index.rst		index.rst
prep.py		prep.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dask Tutorial

Prepare

1. You should clone this repository

2a) Create a conda environment (preferred)

2b) Install into an existing environment

2c) Use Dockerfile

You should follow only one of the options above!

Launch Jupyter

Links

Outline

About

Releases

Sponsor this project

Packages

Contributors 59

Languages

License

dask/dask-tutorial

Folders and files

Latest commit

History

Repository files navigation

Dask Tutorial

Prepare

1. You should clone this repository

2a) Create a conda environment (preferred)

2b) Install into an existing environment

2c) Use Dockerfile

You should follow only one of the options above!

Launch Jupyter

Links

Outline

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 59

Languages

Packages