Skip to content

Commit

Permalink
Make Modin on MPI through unidist component more explicit
Browse files Browse the repository at this point in the history
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
  • Loading branch information
YarShev committed Nov 8, 2023
1 parent 6157b96 commit 6e14a73
Show file tree
Hide file tree
Showing 14 changed files with 43 additions and 32 deletions.
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,17 +57,21 @@ The charts below show the speedup you get by replacing pandas with Modin based o
Modin can be installed with `pip` on Linux, Windows and MacOS:

```bash
pip install "modin[all]" # (Recommended) Install Modin with all of Modin's currently supported engines.
pip install "modin[all]" # Install Modin with Ray and Dask engines.
```

If you want to install Modin with a specific engine, we recommend:

```bash
pip install "modin[ray]" # Install Modin dependencies and Ray.
pip install "modin[dask]" # Install Modin dependencies and Dask.
pip install "modin[unidist]" # Install Modin dependencies and Unidist.
pip install "modin[mpi]" # Install Modin dependencies and MPI through Unidist.
```

To get Modin on MPI through Unidist (starting unidist 0.5.0) fully working
it is required to have a working MPI implementation installed beforehand.
Otherwise, installation of `modin[mpi]` may fail.

Modin automatically detects which engine(s) you have installed and uses that for scheduling computation.

#### From conda-forge
Expand All @@ -85,7 +89,7 @@ Each engine can also be installed individually (and also as a combination of sev
```bash
conda install -c conda-forge modin-ray # Install Modin dependencies and Ray.
conda install -c conda-forge modin-dask # Install Modin dependencies and Dask.
conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist.
conda install -c conda-forge modin-mpi # Install Modin dependencies and MPI through Unidist.
conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK.
```

Expand Down Expand Up @@ -119,7 +123,7 @@ export MODIN_ENGINE=unidist # Modin will use Unidist
```

If you want to choose the Unidist engine, you should set the additional environment
variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI:
variable ``UNIDIST_BACKEND``, because currently Modin only supports MPI through Unidist:

```bash
export UNIDIST_BACKEND=mpi # Unidist will use MPI backend
Expand Down
9 changes: 5 additions & 4 deletions docs/development/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,8 +216,8 @@ documentation page on :doc:`contributing </development/contributing>`.
- Uses the `Dask Futures`_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Dask </flow/modin/core/execution/dask/implementations/pandas_on_dask/index>` page.
- :doc:`pandas on Unidist </development/using_pandas_on_unidist>`
- Uses the Unidist_ execution framework.
- :doc:`pandas on MPI </development/using_pandas_on_mpi>`
- Uses MPI_ through the Unidist_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Unidist </flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index>` page.
- :doc:`pandas on Python </development/using_pandas_on_python>`
Expand All @@ -228,8 +228,8 @@ documentation page on :doc:`contributing </development/contributing>`.
- Uses the Ray_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>` page.
- pandas on Unidist (experimental)
- Uses the Unidist_ execution framework.
- pandas on MPI (experimental)
- Uses MPI_ through the Unidist_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>` page.
- pandas on Dask (experimental)
Expand Down Expand Up @@ -375,6 +375,7 @@ details. The documentation covers most modules, with more docs being added every
.. _Arrow tables: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html
.. _Ray: https://github.com/ray-project/ray
.. _Unidist: https://github.com/modin-project/unidist
.. _MPI: https://www.mpi-forum.org/
.. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe
.. _Dask: https://github.com/dask/dask
.. _Dask Futures: https://docs.dask.org/en/latest/futures.html
Expand Down
2 changes: 1 addition & 1 deletion docs/development/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Development
using_pandas_on_ray
using_pandas_on_dask
using_pandas_on_python
using_pandas_on_unidist
using_pandas_on_mpi
using_hdk
using_pyarrow_on_ray

Expand Down
2 changes: 1 addition & 1 deletion docs/development/partition_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ in the worker process that processes a function (please, refer to `Dask document

Unidist engine
--------------
Currently, Modin only supports unidist on MPI backend. There is no mentioned above issue for
Currently, Modin only supports MPI through Unidist. There is no mentioned above issue for
Modin on ``Unidist`` engine using ``MPI`` backend with ``pandas`` in-memory format
because ``Unidist`` saves any objects in the MPI worker process that processes a function
(please, refer to `Unidist documentation`_ for more information).
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
pandas on Unidist
=================
pandas on MPI through Unidist
=============================

This section describes usage related documents for the pandas on Unidist component of Modin.
This section describes usage related documents for the pandas on MPI through Unidist component of Modin.

Modin uses pandas as a primary memory format of the underlying partitions and optimizes queries
ingested from the API layer in a specific way to this format. Thus, there is no need to care of choosing it
but you can explicitly specify it anyway as shown below.

One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend.
To enable the pandas on Unidist execution using MPI backend you should set the following environment variables:
One of the execution engines that Modin uses is MPI through Unidist.
To enable the pandas on MPI through Unidist execution you should set the following environment variables:

.. code-block:: bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ PandasOnUnidist Execution
Queries that perform data transformation, data ingress or data egress using the `pandas on Unidist` execution
pass through the Modin components detailed below.

To enable `pandas on Unidist` execution, please refer to the usage section in :doc:`pandas on Unidist </development/using_pandas_on_unidist>`.
To enable `pandas on MPI through Unidist` execution,
please refer to the usage section in :doc:`pandas on MPI through Unidist </development/using_pandas_on_mpi>`.

Data Transformation
'''''''''''''''''''
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ and Modin will do computation with that engine:
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
export MODIN_ENGINE=dask # Modin will use Dask
pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist.
pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through Unidist.
export MODIN_ENGINE=unidist # Modin will use Unidist
export UNIDIST_BACKEND=mpi # Unidist will use MPI backend.
Expand Down
13 changes: 7 additions & 6 deletions docs/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Modin can be used with :doc:`Ray</development/using_pandas_on_ray>`, :doc:`Dask<
pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist
pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through Unidist
pip install "modin[all]" # Install all of the above
Modin will automatically detect which engine you have installed and use that for
Expand Down Expand Up @@ -65,7 +65,7 @@ storage formats or for different functionalities of Modin. Here is a list of dep
.. code-block:: bash
pip install "modin[unidist]" # If you want to use the Unidist execution engine
pip install "modin[mpi]" # If you want to use MPI through Unidist execution engine
Installing on Google Colab
"""""""""""""""""""""""""""
Expand Down Expand Up @@ -106,7 +106,7 @@ it is possible to install modin with chosen engine(s) alongside. Current options
+---------------------------------+---------------------------+-----------------------------+
| modin-ray | Ray_ | Linux, Windows |
+---------------------------------+---------------------------+-----------------------------+
| modin-unidist | Unidist_ | Linux, Windows, MacOS |
| modin-mpi | MPI_ through Unidist_ | Linux, Windows, MacOS |
+---------------------------------+---------------------------+-----------------------------+
| modin-hdk | HDK_ | Linux |
+---------------------------------+---------------------------+-----------------------------+
Expand All @@ -117,7 +117,7 @@ For installing Dask, Ray and Unidist engines into conda environment following co

.. code-block:: bash
conda install -c conda-forge modin-ray modin-dask modin-unidist
conda install -c conda-forge modin-ray modin-dask modin-mpi
All set of engines could be available in conda environment by specifying:

Expand All @@ -129,7 +129,7 @@ or explicitly:

.. code-block:: bash
conda install -c conda-forge modin-ray modin-dask modin-unidist modin-hdk
conda install -c conda-forge modin-ray modin-dask modin-mpi modin-hdk
``conda`` may be slow installing ``modin-all`` or combitations of execution engines so we currently recommend using libmamba solver for the installation process.
To do this install it in a base environment:
Expand Down Expand Up @@ -171,7 +171,7 @@ also use ``pip``.
This will install directly from the repo without you having to manually clone it! Please be aware
that these changes have not made it into a release and may not be completely stable.

If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[unidist]`` instead of ``modin[all]`` in the command above.
If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[mpi]`` instead of ``modin[all]`` in the command above.

Windows
-------
Expand Down Expand Up @@ -214,6 +214,7 @@ Once cloned, ``cd`` into the ``modin`` directory and use ``pip`` to install:
.. _WSL: https://docs.microsoft.com/en-us/windows/wsl/install-win10
.. _Ray: http://ray.readthedocs.io
.. _Dask: https://github.com/dask/dask
.. _MPI: https://www.mpi-forum.org/
.. _Unidist: https://github.com/modin-project/unidist
.. _HDK: https://github.com/intel-ai/hdk
.. _`Intel Distribution of Modin`: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html#gs.86stqv
Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ of the targets:
pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist
pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through Unidist
pip install "modin[all]" # Install all of the above
Modin will automatically detect which engine you have installed and use that for
Expand All @@ -77,7 +77,7 @@ variable ``MODIN_ENGINE`` and Modin will do computation with that engine:
export MODIN_ENGINE=unidist # Modin will use Unidist
If you want to choose the Unidist engine, you should set the additional environment
variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI:
variable ``UNIDIST_BACKEND``, because currently Modin only supports MPI through Unidist:

.. code-block:: bash
Expand Down
2 changes: 1 addition & 1 deletion examples/tutorial/jupyter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Currently we provide tutorial notebooks for the following execution backends:

- [PandasOnRay](https://modin.readthedocs.io/en/latest/development/using_pandas_on_ray.html)
- [PandasOnDask](https://modin.readthedocs.io/en/latest/development/using_pandas_on_dask.html)
- [PandasOnUnidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_unidist.html)
- [PandasOnMPI through Unidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_mpi.html)
- [HdkOnNative](https://modin.readthedocs.io/en/latest/development/using_hdk.html)

## Creating a development environment
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"\n",
"Modin uses Ray as an execution engine by default so no additional action is required to start to use it. Alternatively, if you need to use another engine, it should be specified either by setting the Modin config or by setting Modin environment variable before the first operation with Modin as it is shown below. Also, note that the full list of Modin configs and corresponding environment variables can be found in the [Modin Configuration Settings](https://modin.readthedocs.io/en/stable/flow/modin/config.html#modin-configs-list) section of the Modin documentation.\n",
"\n",
"One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation."
"One of the execution engines that Modin uses is Unidist. Currently, Modin only supports MPI through Unidist, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
fsspec>=2022.05.0
jupyterlab
ipywidgets
modin[unidist]
modin[mpi]
modin[spreadsheet]
2 changes: 1 addition & 1 deletion modin/core/execution/unidist/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def initialize_unidist():

if unidist_cfg.Backend.get() != "mpi":
raise RuntimeError(
f"Modin only supports unidist on MPI for now, got unidist backend '{unidist_cfg.Backend.get()}'"
f"Modin only supports MPI through Unidist for now, got unidist backend '{unidist_cfg.Backend.get()}'"
)

if not unidist.is_initialized():
Expand Down
10 changes: 7 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,13 @@
# ray==2.5.0 broken: https://github.com/conda-forge/ray-packages-feedstock/issues/100
# pydantic<2: https://github.com/modin-project/modin/issues/6336
ray_deps = ["ray[default]>=1.13.0,!=2.5.0", "pyarrow>=7.0.0", "pydantic<2"]
unidist_deps = ["unidist[mpi]>=0.2.1"]
mpi_deps = ["unidist[mpi]>=0.2.1"]
spreadsheet_deps = ["modin-spreadsheet>=0.1.0"]
all_deps = dask_deps + ray_deps + unidist_deps + spreadsheet_deps
# Currently, Modin does not include `mpi` option in `all`.
# Otherwise, installation of modin[all] would fail because
# users need to have a working MPI implementation and
# certain software installed beforehand.
all_deps = dask_deps + ray_deps + spreadsheet_deps

# Distribute 'modin-autoimport-pandas.pth' along with binary and source distributions.
# This file provides the "import pandas before Ray init" feature if specific
Expand Down Expand Up @@ -58,7 +62,7 @@ def make_distribution(self):
# can be installed by pip install modin[dask]
"dask": dask_deps,
"ray": ray_deps,
"unidist": unidist_deps,
"mpi": mpi_deps,
"spreadsheet": spreadsheet_deps,
"all": all_deps,
},
Expand Down

0 comments on commit 6e14a73

Please sign in to comment.