diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e121396a0fa..9a7b46a8c6b 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -97,7 +97,6 @@ jobs: run: | MODIN_ENGINE=dask python -c "import modin.pandas as pd; print(pd.DataFrame([1,2,3]))" MODIN_ENGINE=ray python -c "import modin.pandas as pd; print(pd.DataFrame([1,2,3]))" - MODIN_ENGINE=unidist UNIDIST_BACKEND=mpi mpiexec -n 1 python -c "import modin.pandas as pd; print(pd.DataFrame([1,2,3]))" test-internals: needs: [lint-flake8, lint-black-isort] diff --git a/README.md b/README.md index cdfa62332e7..d50c6199822 100644 --- a/README.md +++ b/README.md @@ -57,7 +57,7 @@ The charts below show the speedup you get by replacing pandas with Modin based o Modin can be installed with `pip` on Linux, Windows and MacOS: ```bash -pip install "modin[all]" # (Recommended) Install Modin with all of Modin's currently supported engines. +pip install "modin[all]" # (Recommended) Install Modin with Ray and Dask engines. ``` If you want to install Modin with a specific engine, we recommend: @@ -65,16 +65,22 @@ If you want to install Modin with a specific engine, we recommend: ```bash pip install "modin[ray]" # Install Modin dependencies and Ray. pip install "modin[dask]" # Install Modin dependencies and Dask. -pip install "modin[unidist]" # Install Modin dependencies and Unidist. +pip install "modin[mpi]" # Install Modin dependencies and MPI through unidist. ``` +To get Modin on MPI through unidist (starting unidist 0.5.0) fully working +it is required to have a working MPI implementation installed beforehand. +Otherwise, installation of `modin[mpi]` may fail. Refer to +[Installing with pip](https://unidist.readthedocs.io/en/latest/installation.html#installing-with-pip) +section of the unidist documentation for more details about installation. + Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. #### From conda-forge Installing from [conda forge](https://github.com/conda-forge/modin-feedstock) using `modin-all` will install Modin and four engines: [Ray](https://github.com/ray-project/ray), [Dask](https://github.com/dask/dask), -[Unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). +[MPI through unidist](https://github.com/modin-project/unidist) and [HDK](https://github.com/intel-ai/hdk). ```bash conda install -c conda-forge modin-all @@ -85,10 +91,14 @@ Each engine can also be installed individually (and also as a combination of sev ```bash conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. -conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. +conda install -c conda-forge modin-mpi # Install Modin dependencies and MPI through unidist. conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. ``` +Refer to +[Installing with conda](https://unidist.readthedocs.io/en/latest/installation.html#installing-with-conda) +section of the unidist documentation for more details on how to install a specific MPI implementation to run on. + To speed up conda installation we recommend using libmamba solver. To do this install it in a base environment: ```bash @@ -119,7 +129,7 @@ export MODIN_ENGINE=unidist # Modin will use Unidist ``` If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: +variable ``UNIDIST_BACKEND``. Currently, Modin only supports MPI through unidist: ```bash export UNIDIST_BACKEND=mpi # Unidist will use MPI backend diff --git a/docs/development/architecture.rst b/docs/development/architecture.rst index 22fc7f91496..61bff9c6819 100644 --- a/docs/development/architecture.rst +++ b/docs/development/architecture.rst @@ -216,8 +216,8 @@ documentation page on :doc:`contributing `. - Uses the `Dask Futures`_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`pandas on Dask ` page. -- :doc:`pandas on Unidist ` - - Uses the Unidist_ execution framework. +- :doc:`pandas on MPI ` + - Uses MPI_ through the Unidist_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`pandas on Unidist ` page. - :doc:`pandas on Python ` @@ -228,8 +228,8 @@ documentation page on :doc:`contributing `. - Uses the Ray_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`experimental pandas on Ray ` page. -- pandas on Unidist (experimental) - - Uses the Unidist_ execution framework. +- pandas on MPI (experimental) + - Uses MPI_ through the Unidist_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`experimental pandas on Unidist ` page. - pandas on Dask (experimental) @@ -375,6 +375,7 @@ details. The documentation covers most modules, with more docs being added every .. _Arrow tables: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html .. _Ray: https://github.com/ray-project/ray .. _Unidist: https://github.com/modin-project/unidist +.. _MPI: https://www.mpi-forum.org/ .. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe .. _Dask: https://github.com/dask/dask .. _Dask Futures: https://docs.dask.org/en/latest/futures.html diff --git a/docs/development/index.rst b/docs/development/index.rst index ab820cb17bd..b1e5c3f1212 100644 --- a/docs/development/index.rst +++ b/docs/development/index.rst @@ -10,7 +10,7 @@ Development using_pandas_on_ray using_pandas_on_dask using_pandas_on_python - using_pandas_on_unidist + using_pandas_on_mpi using_hdk using_pyarrow_on_ray diff --git a/docs/development/partition_api.rst b/docs/development/partition_api.rst index 40fb7532140..3844fdb5ab4 100644 --- a/docs/development/partition_api.rst +++ b/docs/development/partition_api.rst @@ -36,7 +36,7 @@ in the worker process that processes a function (please, refer to `Dask document Unidist engine -------------- -Currently, Modin only supports unidist on MPI backend. There is no mentioned above issue for +Currently, Modin only supports MPI through unidist. There is no mentioned above issue for Modin on ``Unidist`` engine using ``MPI`` backend with ``pandas`` in-memory format because ``Unidist`` saves any objects in the MPI worker process that processes a function (please, refer to `Unidist documentation`_ for more information). diff --git a/docs/development/using_pandas_on_unidist.rst b/docs/development/using_pandas_on_mpi.rst similarity index 79% rename from docs/development/using_pandas_on_unidist.rst rename to docs/development/using_pandas_on_mpi.rst index 9acc3126268..750be48a780 100644 --- a/docs/development/using_pandas_on_unidist.rst +++ b/docs/development/using_pandas_on_mpi.rst @@ -1,14 +1,14 @@ -pandas on Unidist -================= +pandas on MPI through unidist +============================= -This section describes usage related documents for the pandas on Unidist component of Modin. +This section describes usage related documents for the pandas on MPI through unidist component of Modin. Modin uses pandas as a primary memory format of the underlying partitions and optimizes queries ingested from the API layer in a specific way to this format. Thus, there is no need to care of choosing it but you can explicitly specify it anyway as shown below. -One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend. -To enable the pandas on Unidist execution using MPI backend you should set the following environment variables: +One of the execution engines that Modin uses is MPI through unidist. +To enable the pandas on MPI through unidist execution you should set the following environment variables: .. code-block:: bash diff --git a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst index 85e6a8177e5..0a80865f376 100644 --- a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst +++ b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst @@ -6,7 +6,8 @@ PandasOnUnidist Execution Queries that perform data transformation, data ingress or data egress using the `pandas on Unidist` execution pass through the Modin components detailed below. -To enable `pandas on Unidist` execution, please refer to the usage section in :doc:`pandas on Unidist `. +To enable `pandas on MPI through unidist` execution, +please refer to the usage section in :doc:`pandas on MPI through unidist `. Data Transformation ''''''''''''''''''' diff --git a/docs/getting_started/faq.rst b/docs/getting_started/faq.rst index bca4e36f521..8e8257ccd09 100644 --- a/docs/getting_started/faq.rst +++ b/docs/getting_started/faq.rst @@ -119,7 +119,7 @@ and Modin will do computation with that engine: pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask export MODIN_ENGINE=dask # Modin will use Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist. + pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through unidist. export MODIN_ENGINE=unidist # Modin will use Unidist export UNIDIST_BACKEND=mpi # Unidist will use MPI backend. diff --git a/docs/getting_started/installation.rst b/docs/getting_started/installation.rst index 90446735b6a..f140510c966 100644 --- a/docs/getting_started/installation.rst +++ b/docs/getting_started/installation.rst @@ -30,8 +30,13 @@ Modin can be used with :doc:`Ray`, :doc:`Dask< pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist - pip install "modin[all]" # Install all of the above + pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through unidist + pip install "modin[all]" # Install Ray and Dask + +To get Modin on MPI through unidist (starting unidist 0.5.0) fully working +it is required to have a working MPI implementation installed beforehand. +Otherwise, installation of ``modin[mpi]`` may fail. Refer to +`Installing with pip`_ section of the unidist documentation for more details about installation. Modin will automatically detect which engine you have installed and use that for scheduling computation! See below for HDK engine installation. @@ -65,7 +70,7 @@ storage formats or for different functionalities of Modin. Here is a list of dep .. code-block:: bash - pip install "modin[unidist]" # If you want to use the Unidist execution engine + pip install "modin[mpi]" # If you want to use MPI through unidist execution engine Installing on Google Colab """"""""""""""""""""""""""" @@ -106,18 +111,18 @@ it is possible to install modin with chosen engine(s) alongside. Current options +---------------------------------+---------------------------+-----------------------------+ | modin-ray | Ray_ | Linux, Windows | +---------------------------------+---------------------------+-----------------------------+ -| modin-unidist | Unidist_ | Linux, Windows, MacOS | +| modin-mpi | MPI_ through unidist_ | Linux, Windows, MacOS | +---------------------------------+---------------------------+-----------------------------+ | modin-hdk | HDK_ | Linux | +---------------------------------+---------------------------+-----------------------------+ | modin-all | Dask, Ray, Unidist, HDK | Linux | +---------------------------------+---------------------------+-----------------------------+ -For installing Dask, Ray and Unidist engines into conda environment following command should be used: +For installing Dask, Ray and MPI through unidist engines into conda environment following command should be used: .. code-block:: bash - conda install -c conda-forge modin-ray modin-dask modin-unidist + conda install -c conda-forge modin-ray modin-dask modin-mpi All set of engines could be available in conda environment by specifying: @@ -129,7 +134,10 @@ or explicitly: .. code-block:: bash - conda install -c conda-forge modin-ray modin-dask modin-unidist modin-hdk + conda install -c conda-forge modin-ray modin-dask modin-mpi modin-hdk + +Refer to `Installing with conda`_ section of the unidist documentation +for more details on how to install a specific MPI implementation to run on. ``conda`` may be slow installing ``modin-all`` or combitations of execution engines so we currently recommend using libmamba solver for the installation process. To do this install it in a base environment: @@ -171,7 +179,7 @@ also use ``pip``. This will install directly from the repo without you having to manually clone it! Please be aware that these changes have not made it into a release and may not be completely stable. -If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[unidist]`` instead of ``modin[all]`` in the command above. +If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[mpi]`` instead of ``modin[all]`` in the command above. Windows ------- @@ -214,7 +222,10 @@ Once cloned, ``cd`` into the ``modin`` directory and use ``pip`` to install: .. _WSL: https://docs.microsoft.com/en-us/windows/wsl/install-win10 .. _Ray: http://ray.readthedocs.io .. _Dask: https://github.com/dask/dask +.. _MPI: https://www.mpi-forum.org/ .. _Unidist: https://github.com/modin-project/unidist +.. _`Installing with pip`: https://unidist.readthedocs.io/en/latest/installation.html#installing-with-pip +.. _`Installing with conda`: https://unidist.readthedocs.io/en/latest/installation.html#installing-with-conda .. _HDK: https://github.com/intel-ai/hdk .. _`Intel Distribution of Modin`: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html#gs.86stqv .. _`Intel Distribution of Modin Getting Started`: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-distribution-of-modin-getting-started-guide.html diff --git a/docs/index.rst b/docs/index.rst index ae73833f6c6..ff5d56e476d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -61,7 +61,7 @@ of the targets: pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist + pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through unidist pip install "modin[all]" # Install all of the above Modin will automatically detect which engine you have installed and use that for @@ -77,7 +77,7 @@ variable ``MODIN_ENGINE`` and Modin will do computation with that engine: export MODIN_ENGINE=unidist # Modin will use Unidist If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: +variable ``UNIDIST_BACKEND``, because currently Modin only supports MPI through unidist: .. code-block:: bash diff --git a/examples/tutorial/jupyter/README.md b/examples/tutorial/jupyter/README.md index 88cbb307c34..78fbb5116e1 100644 --- a/examples/tutorial/jupyter/README.md +++ b/examples/tutorial/jupyter/README.md @@ -4,7 +4,7 @@ Currently we provide tutorial notebooks for the following execution backends: - [PandasOnRay](https://modin.readthedocs.io/en/latest/development/using_pandas_on_ray.html) - [PandasOnDask](https://modin.readthedocs.io/en/latest/development/using_pandas_on_dask.html) -- [PandasOnUnidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_unidist.html) +- [PandasOnMPI through unidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_mpi.html) - [HdkOnNative](https://modin.readthedocs.io/en/latest/development/using_hdk.html) ## Creating a development environment diff --git a/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb b/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb index 1700a78b86a..cd8c1606e58 100644 --- a/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb +++ b/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb @@ -64,7 +64,7 @@ "\n", "Modin uses Ray as an execution engine by default so no additional action is required to start to use it. Alternatively, if you need to use another engine, it should be specified either by setting the Modin config or by setting Modin environment variable before the first operation with Modin as it is shown below. Also, note that the full list of Modin configs and corresponding environment variables can be found in the [Modin Configuration Settings](https://modin.readthedocs.io/en/stable/flow/modin/config.html#modin-configs-list) section of the Modin documentation.\n", "\n", - "One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation." + "One of the execution engines that Modin uses is Unidist. Currently, Modin only supports MPI through unidist, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation." ] }, { diff --git a/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt b/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt index e093df33b8b..4973e170cc5 100644 --- a/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt +++ b/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt @@ -1,5 +1,5 @@ fsspec>=2022.05.0 jupyterlab ipywidgets -modin[unidist] +modin[mpi] modin[spreadsheet] diff --git a/modin/core/execution/unidist/common/utils.py b/modin/core/execution/unidist/common/utils.py index db9648382b5..5159e05a3c3 100644 --- a/modin/core/execution/unidist/common/utils.py +++ b/modin/core/execution/unidist/common/utils.py @@ -29,7 +29,7 @@ def initialize_unidist(): if unidist_cfg.Backend.get() != "mpi": raise RuntimeError( - f"Modin only supports unidist on MPI for now, got unidist backend '{unidist_cfg.Backend.get()}'" + f"Modin only supports MPI through unidist for now, got unidist backend '{unidist_cfg.Backend.get()}'" ) if not unidist.is_initialized(): diff --git a/setup.py b/setup.py index b1b6f39fbf0..8e8036d872e 100644 --- a/setup.py +++ b/setup.py @@ -9,9 +9,13 @@ # ray==2.5.0 broken: https://github.com/conda-forge/ray-packages-feedstock/issues/100 # pydantic<2: https://github.com/modin-project/modin/issues/6336 ray_deps = ["ray[default]>=1.13.0,!=2.5.0", "pyarrow>=7.0.0", "pydantic<2"] -unidist_deps = ["unidist[mpi]>=0.2.1"] +mpi_deps = ["unidist[mpi]>=0.2.1"] spreadsheet_deps = ["modin-spreadsheet>=0.1.0"] -all_deps = dask_deps + ray_deps + unidist_deps + spreadsheet_deps +# Currently, Modin does not include `mpi` option in `all`. +# Otherwise, installation of modin[all] would fail because +# users need to have a working MPI implementation and +# certain software installed beforehand. +all_deps = dask_deps + ray_deps + spreadsheet_deps # Distribute 'modin-autoimport-pandas.pth' along with binary and source distributions. # This file provides the "import pandas before Ray init" feature if specific @@ -58,7 +62,7 @@ def make_distribution(self): # can be installed by pip install modin[dask] "dask": dask_deps, "ray": ray_deps, - "unidist": unidist_deps, + "mpi": mpi_deps, "spreadsheet": spreadsheet_deps, "all": all_deps, },