Skip to content

Commit

Permalink
Refactor high-level description to more accurately reflect Dask cloud…
Browse files Browse the repository at this point in the history
…provider (#391)

* Release 2022.10.0

* Refactor high-level description and add alternatives page

* Reword readme

* Fix pre-commit url
  • Loading branch information
jacobtomlinson authored Sep 16, 2024
1 parent 6399cce commit 6ca172e
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 3 deletions.
8 changes: 5 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Dask Cloud Provider
:alt: Conda Forge


Native Cloud integration for Dask. This library intends to allow people to
create dask clusters on a given cloud provider with no set up other than having
credentials.
Native Cloud integration for Dask.

This library provides tools to enable Dask clusters to more natively integrate with the cloud.
It includes cluster managers to create dask clusters on a given cloud provider using native resources,
plugins to more closely integrate Dask components with the cloud platform they are running on and documentation to empower all folks running Dask on the cloud.
51 changes: 51 additions & 0 deletions doc/source/alternatives.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Alternatives
============

Many tools and services exist today for deploying Dask clusters, many of which are commonly used on the cloud.
This project aims to provide cloud native plugins and tools for Dask which can often compliment other approaches.

Community tools
---------------

Dask has a `vibrant ecosystem of community tooling for deploying Dask <https://docs.dask.org/en/latest/ecosystem.html#deploying-dask>`_ on various platforms. Many of which can be used on public cloud.

Kubernetes
^^^^^^^^^^

`Kubernetes <https://kubernetes.io/>`_ is an extremely popular project for managing cloud workloads and is part of the broader `Cloud Native Computing Foundation (CNCF) <https://www.cncf.io/>`_ ecosystem.

Dask has many options for `deploying clusters on Kubernetes <https://docs.dask.org/en/stable/deploying-kubernetes.html>`_.

HPC on Cloud
^^^^^^^^^^^^

Many popular HPC scheduling tools are used on the cloud and support features such as elastic scaling.
If you are already leveraging HPC tools like `SLURM on the cloud <https://slurm.schedmd.com/elastic_computing.html>`_ then `Dask has great integration with HPC schedulers <https://jobqueue.dask.org/en/latest/>`_.

Hadoop/Spark/Yarn
^^^^^^^^^^^^^^^^^

Many cloud platforms have popular managed services for running Apache Spark workloads.

If you're already using a managed map-reduce service like `Amazon EMR <https://aws.amazon.com/emr/>`_ then check out `dask-yarn <https://yarn.dask.org/en/latest/>`_.

Nebari
^^^^^^

`Nebari <https://www.nebari.dev/>`_ is an open source data science platform which can be run locally or on a cloud platform of your choice.
It includes a managed Dask service built on `Dask Gateway <http://gateway.dask.org/>`_ for managing Dask clusters.

Managed Services
----------------

Cloud vendors and third-party companies also offer managed Dask clusters as a service

Coiled
^^^^^^

`Coiled <https://www.coiled.io/>`_ is a mature managed Dask service that spawns clusters in your cloud account and allows you to manage them via a central control plane.

Saturn Cloud
^^^^^^^^^^^^

`Saturn Cloud <https://saturncloud.io/>`_ is a managed data science platform with hosted Dask clusters or the option to deploy them in your own AWS account.
30 changes: 30 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@ Dask Cloud Provider

*Native Cloud integration for Dask.*

This package contains open source tools to help you deploy and operate Dask clusters on the cloud.
It contains cluster managers which can help you launch clusters using native cloud resources like VMs or containers,
it has tools and plugins for use in ANY cluster running on the cloud and is a great source of documentation for Dask cloud deployments.

It is by no means the "complete" or "only" way to run Dask on the cloud, check out the :doc:`alternatives` page for more tools.

Cluster managers
----------------

This package provides classes for constructing and managing ephemeral Dask clusters on various
cloud platforms.

Expand Down Expand Up @@ -52,13 +61,34 @@ this code.
with Client(cluster) as client:
# Do some Dask things
Plugins
-------

Dask components like Schedulers and Workers can benefit from being cloud-aware.
This project has plugins and tools that extend these components.

One example is having the workers check for termination warnings when running on ephemeral/spot instances and begin migrating data to other workers.

For Azure VMs you could use the :class:`dask_cloudprovider.azure.AzurePreemptibleWorkerPlugin` to do this.
It can be used on any cluster that has workers running on Azure VMs, not just ones created with :class:`dask_cloudprovider.azure.AzureVMCluster`.

.. code-block:: python
from distributed import Client
client = Client("<Any Dask cluster running on Azure VMs>")
from dask_cloudprovider.azure import AzurePreemptibleWorkerPlugin
client.register_worker_plugin(AzurePreemptibleWorkerPlugin())
.. toctree::
:maxdepth: 2
:hidden:
:caption: Overview

installation.rst
config.rst
alternatives.rst

.. toctree::
:maxdepth: 2
Expand Down

0 comments on commit 6ca172e

Please sign in to comment.