diff --git a/README.md b/README.md index cdfa62332e7..848e6506cec 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ If you want to install Modin with a specific engine, we recommend: ```bash pip install "modin[ray]" # Install Modin dependencies and Ray. pip install "modin[dask]" # Install Modin dependencies and Dask. -pip install "modin[unidist]" # Install Modin dependencies and Unidist. +pip install "modin[mpi]" # Install Modin dependencies and MPI through unidist. ``` Modin automatically detects which engine(s) you have installed and uses that for scheduling computation. @@ -85,7 +85,7 @@ Each engine can also be installed individually (and also as a combination of sev ```bash conda install -c conda-forge modin-ray # Install Modin dependencies and Ray. conda install -c conda-forge modin-dask # Install Modin dependencies and Dask. -conda install -c conda-forge modin-unidist # Install Modin dependencies and Unidist. +conda install -c conda-forge modin-mpi # Install Modin dependencies and MPI through unidist. conda install -c conda-forge modin-hdk # Install Modin dependencies and HDK. ``` @@ -119,7 +119,7 @@ export MODIN_ENGINE=unidist # Modin will use Unidist ``` If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: +variable ``UNIDIST_BACKEND``, because currently Modin only supports MPI through Unidist: ```bash export UNIDIST_BACKEND=mpi # Unidist will use MPI backend diff --git a/docs/development/architecture.rst b/docs/development/architecture.rst index 22fc7f91496..61bff9c6819 100644 --- a/docs/development/architecture.rst +++ b/docs/development/architecture.rst @@ -216,8 +216,8 @@ documentation page on :doc:`contributing `. - Uses the `Dask Futures`_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`pandas on Dask ` page. -- :doc:`pandas on Unidist ` - - Uses the Unidist_ execution framework. +- :doc:`pandas on MPI ` + - Uses MPI_ through the Unidist_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`pandas on Unidist ` page. - :doc:`pandas on Python ` @@ -228,8 +228,8 @@ documentation page on :doc:`contributing `. - Uses the Ray_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`experimental pandas on Ray ` page. -- pandas on Unidist (experimental) - - Uses the Unidist_ execution framework. +- pandas on MPI (experimental) + - Uses MPI_ through the Unidist_ execution framework. - The storage format is `pandas` and the in-memory partition type is a pandas DataFrame. - For more information on the execution path, see the :doc:`experimental pandas on Unidist ` page. - pandas on Dask (experimental) @@ -375,6 +375,7 @@ details. The documentation covers most modules, with more docs being added every .. _Arrow tables: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html .. _Ray: https://github.com/ray-project/ray .. _Unidist: https://github.com/modin-project/unidist +.. _MPI: https://www.mpi-forum.org/ .. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe .. _Dask: https://github.com/dask/dask .. _Dask Futures: https://docs.dask.org/en/latest/futures.html diff --git a/docs/development/index.rst b/docs/development/index.rst index ab820cb17bd..b1e5c3f1212 100644 --- a/docs/development/index.rst +++ b/docs/development/index.rst @@ -10,7 +10,7 @@ Development using_pandas_on_ray using_pandas_on_dask using_pandas_on_python - using_pandas_on_unidist + using_pandas_on_mpi using_hdk using_pyarrow_on_ray diff --git a/docs/development/partition_api.rst b/docs/development/partition_api.rst index 40fb7532140..ff22a8bcc0b 100644 --- a/docs/development/partition_api.rst +++ b/docs/development/partition_api.rst @@ -36,7 +36,7 @@ in the worker process that processes a function (please, refer to `Dask document Unidist engine -------------- -Currently, Modin only supports unidist on MPI backend. There is no mentioned above issue for +Currently, Modin only supports MPI through Unidist. There is no mentioned above issue for Modin on ``Unidist`` engine using ``MPI`` backend with ``pandas`` in-memory format because ``Unidist`` saves any objects in the MPI worker process that processes a function (please, refer to `Unidist documentation`_ for more information). diff --git a/docs/development/using_pandas_on_unidist.rst b/docs/development/using_pandas_on_mpi.rst similarity index 79% rename from docs/development/using_pandas_on_unidist.rst rename to docs/development/using_pandas_on_mpi.rst index 9acc3126268..750be48a780 100644 --- a/docs/development/using_pandas_on_unidist.rst +++ b/docs/development/using_pandas_on_mpi.rst @@ -1,14 +1,14 @@ -pandas on Unidist -================= +pandas on MPI through unidist +============================= -This section describes usage related documents for the pandas on Unidist component of Modin. +This section describes usage related documents for the pandas on MPI through unidist component of Modin. Modin uses pandas as a primary memory format of the underlying partitions and optimizes queries ingested from the API layer in a specific way to this format. Thus, there is no need to care of choosing it but you can explicitly specify it anyway as shown below. -One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend. -To enable the pandas on Unidist execution using MPI backend you should set the following environment variables: +One of the execution engines that Modin uses is MPI through unidist. +To enable the pandas on MPI through unidist execution you should set the following environment variables: .. code-block:: bash diff --git a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst index 85e6a8177e5..6f3905e67b4 100644 --- a/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst +++ b/docs/flow/modin/core/execution/unidist/implementations/pandas_on_unidist/index.rst @@ -6,7 +6,7 @@ PandasOnUnidist Execution Queries that perform data transformation, data ingress or data egress using the `pandas on Unidist` execution pass through the Modin components detailed below. -To enable `pandas on Unidist` execution, please refer to the usage section in :doc:`pandas on Unidist `. +To enable `pandas on MPI through unidist` execution, please refer to the usage section in :doc:`pandas on Unidist `. Data Transformation ''''''''''''''''''' diff --git a/docs/getting_started/faq.rst b/docs/getting_started/faq.rst index bca4e36f521..8e8257ccd09 100644 --- a/docs/getting_started/faq.rst +++ b/docs/getting_started/faq.rst @@ -119,7 +119,7 @@ and Modin will do computation with that engine: pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask export MODIN_ENGINE=dask # Modin will use Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist. + pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through unidist. export MODIN_ENGINE=unidist # Modin will use Unidist export UNIDIST_BACKEND=mpi # Unidist will use MPI backend. diff --git a/docs/getting_started/installation.rst b/docs/getting_started/installation.rst index 90446735b6a..b28c9535d8e 100644 --- a/docs/getting_started/installation.rst +++ b/docs/getting_started/installation.rst @@ -30,7 +30,7 @@ Modin can be used with :doc:`Ray`, :doc:`Dask< pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist + pip install "modin[mpi]" # Install Modin dependencies and Unidist to run on Unidist pip install "modin[all]" # Install all of the above Modin will automatically detect which engine you have installed and use that for @@ -65,7 +65,7 @@ storage formats or for different functionalities of Modin. Here is a list of dep .. code-block:: bash - pip install "modin[unidist]" # If you want to use the Unidist execution engine + pip install "modin[mpi]" # If you want to use the Unidist execution engine Installing on Google Colab """"""""""""""""""""""""""" @@ -106,7 +106,7 @@ it is possible to install modin with chosen engine(s) alongside. Current options +---------------------------------+---------------------------+-----------------------------+ | modin-ray | Ray_ | Linux, Windows | +---------------------------------+---------------------------+-----------------------------+ -| modin-unidist | Unidist_ | Linux, Windows, MacOS | +| modin-mpi | MPI_ through Unidist_ | Linux, Windows, MacOS | +---------------------------------+---------------------------+-----------------------------+ | modin-hdk | HDK_ | Linux | +---------------------------------+---------------------------+-----------------------------+ @@ -117,7 +117,7 @@ For installing Dask, Ray and Unidist engines into conda environment following co .. code-block:: bash - conda install -c conda-forge modin-ray modin-dask modin-unidist + conda install -c conda-forge modin-ray modin-dask modin-mpi All set of engines could be available in conda environment by specifying: @@ -129,7 +129,7 @@ or explicitly: .. code-block:: bash - conda install -c conda-forge modin-ray modin-dask modin-unidist modin-hdk + conda install -c conda-forge modin-ray modin-dask modin-mpi modin-hdk ``conda`` may be slow installing ``modin-all`` or combitations of execution engines so we currently recommend using libmamba solver for the installation process. To do this install it in a base environment: @@ -171,7 +171,7 @@ also use ``pip``. This will install directly from the repo without you having to manually clone it! Please be aware that these changes have not made it into a release and may not be completely stable. -If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[unidist]`` instead of ``modin[all]`` in the command above. +If you would like to install Modin with a specific engine, you can use ``modin[ray]`` or ``modin[dask]`` or ``modin[mpi]`` instead of ``modin[all]`` in the command above. Windows ------- @@ -214,6 +214,7 @@ Once cloned, ``cd`` into the ``modin`` directory and use ``pip`` to install: .. _WSL: https://docs.microsoft.com/en-us/windows/wsl/install-win10 .. _Ray: http://ray.readthedocs.io .. _Dask: https://github.com/dask/dask +.. _MPI: https://www.mpi-forum.org/ .. _Unidist: https://github.com/modin-project/unidist .. _HDK: https://github.com/intel-ai/hdk .. _`Intel Distribution of Modin`: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/distribution-of-modin.html#gs.86stqv diff --git a/docs/index.rst b/docs/index.rst index ae73833f6c6..cefb99fe806 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -61,7 +61,7 @@ of the targets: pip install "modin[ray]" # Install Modin dependencies and Ray to run on Ray pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask - pip install "modin[unidist]" # Install Modin dependencies and Unidist to run on Unidist + pip install "modin[mpi]" # Install Modin dependencies and MPI to run on MPI through Unidist pip install "modin[all]" # Install all of the above Modin will automatically detect which engine you have installed and use that for @@ -77,7 +77,7 @@ variable ``MODIN_ENGINE`` and Modin will do computation with that engine: export MODIN_ENGINE=unidist # Modin will use Unidist If you want to choose the Unidist engine, you should set the additional environment -variable ``UNIDIST_BACKEND``, because currently Modin only supports Unidist on MPI: +variable ``UNIDIST_BACKEND``, because currently Modin only supports MPI through Unidist: .. code-block:: bash diff --git a/examples/tutorial/jupyter/README.md b/examples/tutorial/jupyter/README.md index 88cbb307c34..78fbb5116e1 100644 --- a/examples/tutorial/jupyter/README.md +++ b/examples/tutorial/jupyter/README.md @@ -4,7 +4,7 @@ Currently we provide tutorial notebooks for the following execution backends: - [PandasOnRay](https://modin.readthedocs.io/en/latest/development/using_pandas_on_ray.html) - [PandasOnDask](https://modin.readthedocs.io/en/latest/development/using_pandas_on_dask.html) -- [PandasOnUnidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_unidist.html) +- [PandasOnMPI through unidist](https://modin.readthedocs.io/en/latest/development/using_pandas_on_mpi.html) - [HdkOnNative](https://modin.readthedocs.io/en/latest/development/using_hdk.html) ## Creating a development environment diff --git a/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb b/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb index 1700a78b86a..130083762fb 100644 --- a/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb +++ b/examples/tutorial/jupyter/execution/pandas_on_unidist/local/exercise_1.ipynb @@ -64,7 +64,7 @@ "\n", "Modin uses Ray as an execution engine by default so no additional action is required to start to use it. Alternatively, if you need to use another engine, it should be specified either by setting the Modin config or by setting Modin environment variable before the first operation with Modin as it is shown below. Also, note that the full list of Modin configs and corresponding environment variables can be found in the [Modin Configuration Settings](https://modin.readthedocs.io/en/stable/flow/modin/config.html#modin-configs-list) section of the Modin documentation.\n", "\n", - "One of the execution engines that Modin uses is Unidist. Currently, Modin only supports Unidist on MPI backend, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation." + "One of the execution engines that Modin uses is Unidist. Currently, Modin only supports MPI through Unidist, so it should be specified either by setting the Unidist config or by setting Unidist environment variable. The full list of Unidist configs and corresponding environment variables can be found in the [Unidist Configuration Settings](https://unidist.readthedocs.io/en/latest/flow/unidist/config.html#unidist-configuration-settings-list) section of the Unidist documentation." ] }, { diff --git a/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt b/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt index e093df33b8b..4973e170cc5 100644 --- a/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt +++ b/examples/tutorial/jupyter/execution/pandas_on_unidist/requirements.txt @@ -1,5 +1,5 @@ fsspec>=2022.05.0 jupyterlab ipywidgets -modin[unidist] +modin[mpi] modin[spreadsheet] diff --git a/modin/core/execution/unidist/common/utils.py b/modin/core/execution/unidist/common/utils.py index db9648382b5..5159e05a3c3 100644 --- a/modin/core/execution/unidist/common/utils.py +++ b/modin/core/execution/unidist/common/utils.py @@ -29,7 +29,7 @@ def initialize_unidist(): if unidist_cfg.Backend.get() != "mpi": raise RuntimeError( - f"Modin only supports unidist on MPI for now, got unidist backend '{unidist_cfg.Backend.get()}'" + f"Modin only supports MPI through unidist for now, got unidist backend '{unidist_cfg.Backend.get()}'" ) if not unidist.is_initialized(): diff --git a/setup.py b/setup.py index b1b6f39fbf0..8e8036d872e 100644 --- a/setup.py +++ b/setup.py @@ -9,9 +9,13 @@ # ray==2.5.0 broken: https://github.com/conda-forge/ray-packages-feedstock/issues/100 # pydantic<2: https://github.com/modin-project/modin/issues/6336 ray_deps = ["ray[default]>=1.13.0,!=2.5.0", "pyarrow>=7.0.0", "pydantic<2"] -unidist_deps = ["unidist[mpi]>=0.2.1"] +mpi_deps = ["unidist[mpi]>=0.2.1"] spreadsheet_deps = ["modin-spreadsheet>=0.1.0"] -all_deps = dask_deps + ray_deps + unidist_deps + spreadsheet_deps +# Currently, Modin does not include `mpi` option in `all`. +# Otherwise, installation of modin[all] would fail because +# users need to have a working MPI implementation and +# certain software installed beforehand. +all_deps = dask_deps + ray_deps + spreadsheet_deps # Distribute 'modin-autoimport-pandas.pth' along with binary and source distributions. # This file provides the "import pandas before Ray init" feature if specific @@ -58,7 +62,7 @@ def make_distribution(self): # can be installed by pip install modin[dask] "dask": dask_deps, "ray": ray_deps, - "unidist": unidist_deps, + "mpi": mpi_deps, "spreadsheet": spreadsheet_deps, "all": all_deps, },