Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS-#7287: Update Modin on Dask documentation #7288

Merged
merged 3 commits into from
May 27, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 61 additions & 1 deletion docs/development/using_pandas_on_dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,64 @@ or turn them on in source code:

import modin.config as cfg
cfg.Engine.put('dask')
cfg.StorageFormat.put('pandas')
cfg.StorageFormat.put('pandas')

Using Modin on Dask locally
---------------------------

If you want to run Modin on Dask locally using a single node, just set Modin engine to ``Dask`` and
continue working with a Modin DataFrame as if it was a pandas DataFrame.
You can either initialize a Dask client on your own and Modin connects to the existing Dask cluster or
allow Modin itself to initialize a Dask client.

.. code-block:: python

import modin.pandas as pd
import modin.config as modin_cfg

modin_cfg.Engine.put("dask")
df = pd.DataFrame(...)

Using Modin on Dask in a Cluster
--------------------------------

If you want to run Modin on Dask in a cluster, you should set up a Dask cluster and initialize a Dask client.
Once the Dask client is initialized, Modin will be able to connect to it and use the Dask cluster.

.. code-block:: python

from distributed import Client
import modin.pandas as pd
import modin.config as modin_cfg

# Define your cluster here
cluster = ...
client = Client(cluster)

modin_cfg.Engine.put("dask")
df = pd.DataFrame(...)

To get more information on how to deploy and run a Dask cluster, visit the `Deploy Dask Clusters`_ page.

Conversion between Modin DataFrame and Dask DataFrame
-----------------------------------------------------

Modin DataFrame can be converted to/from Dask DataFrame with no-copy partition conversion.
This allows you to take advantage of both Modin and Dask libraries for maximum performance.

.. code-block:: python

import modin.pandas as pd
import modin.config as modin_cfg
from modin.pandas.io import to_dask, from_dask

modin_cfg.Engine.put("dask")
df = pd.DataFrame(...)

# Convert Modin to Dask DataFrame
dask_df = to_dask(df)

# Convert Dask to Modin DataFrame
modin_df = from_dask(dask_df)

.. _Deploy Dask Clusters: https://docs.dask.org/en/stable/deploying.html
Loading