Skip to content

Commit

Permalink
REFACTOR-modin-project#6012: move experimental dispatchers under `mod…
Browse files Browse the repository at this point in the history
…in/experimental/...` folder (modin-project#6011)

Co-authored-by: Iaroslav Igoshev <Poolliver868@mail.ru>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
  • Loading branch information
anmyachev and YarShev committed Apr 18, 2023
1 parent 6fd9b8f commit bbc7697
Show file tree
Hide file tree
Showing 17 changed files with 149 additions and 41 deletions.
21 changes: 15 additions & 6 deletions docs/development/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,10 @@ documentation page on :doc:`contributing </development/contributing>`.
- Uses the Unidist_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>` page.
- pandas on Dask (experimental)
- Uses the Dask_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>` page.
- :doc:`HDK on Native </development/using_hdk>` (experimental)
- Uses HDK as an engine.
- The storage format is `hdk` and the in-memory partition type is a pyarrow Table. When defaulting to pandas, the pandas DataFrame is used.
Expand Down Expand Up @@ -344,12 +348,16 @@ details. The documentation covers most modules, with more docs being added every
│ │ │ │ │ └───implementations
│ │ │ │ │ ├─── :doc:`pandas_on_ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>`
│ │ │ │ │ └─── :doc:`pyarrow_on_ray </flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray>`
│ │ │ │ └───unidist
│ │ │ │ └───implementations
│ │ │ │ └─── :doc:`pandas_on_unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>`
│ │ │ └─── :doc:`storage_formats </flow/modin/experimental/core/storage_formats/index>`
| │ │ ├─── :doc:`hdk </flow/modin/experimental/core/storage_formats/hdk/index>`
│ │ │ └─── :doc:`pyarrow </flow/modin/experimental/core/storage_formats/pyarrow/index>`
│ │ │ │ ├───unidist
│ │ │ │ | └───implementations
│ │ │ │ | └─── :doc:`pandas_on_unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>`
| │ | | └───dask
| | | | └───implementations
│ │ │ │ └─── :doc:`pandas_on_dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>`
│ │ │ ├─── :doc:`storage_formats </flow/modin/experimental/core/storage_formats/index>`
| │ │ | ├─── :doc:`hdk </flow/modin/experimental/core/storage_formats/hdk/index>`
│ │ │ | └─── :doc:`pyarrow </flow/modin/experimental/core/storage_formats/pyarrow/index>`
| | | └─── :doc:`io </flow/modin/experimental/core/io/index>`
│ │ ├─── :doc:`pandas </flow/modin/experimental/pandas>`
│ │ ├─── :doc:`sklearn </flow/modin/experimental/sklearn>`
│ │ ├───spreadsheet
Expand All @@ -368,6 +376,7 @@ details. The documentation covers most modules, with more docs being added every
.. _Ray: https://github.com/ray-project/ray
.. _Unidist: https://github.com/modin-project/unidist
.. _code: https://github.com/modin-project/modin/blob/master/modin/core/dataframe
.. _Dask: https://github.com/dask/dask
.. _Dask Futures: https://docs.dask.org/en/latest/futures.html
.. _issue: https://github.com/modin-project/modin/issues
.. _Discourse: https://discuss.modin.org
Expand Down
3 changes: 1 addition & 2 deletions docs/flow/modin/core/io/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@ classes for reading files of different formats.
``_read_rows`` function for moving file offset at the specified amount of rows
and many other functions.

* format/feature specific dispatchers: ``csv_dispatcher.py``, ``csv_glob_dispatcher.py``
(reading multiple files simultaneously, experimental feature), ``excel_dispatcher.py``,
* format/feature specific dispatchers: ``csv_dispatcher.py``, ``excel_dispatcher.py``,
``fwf_dispatcher.py`` and ``json_dispatcher.py``.

* column_stores - directory for storing all columnar store file format dispatcher classes
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
:orphan:

ExperimentalPandasOnDask Execution
==================================

`ExperimentalPandasOnDask` execution keeps the underlying mechanisms of :doc:`PandasOnDask </flow/modin/core/execution/dask/implementations/pandas_on_dask/index>`
execution architecturally unchanged and adds experimental features of ``Data Transformation``, ``Data Ingress`` and ``Data Egress`` (e.g. :py:func:`~modin.experimental.pandas.read_pickle_distributed`).

PandasOnDask and ExperimentalPandasOnDask differences
-----------------------------------------------------

- another Factory ``PandasOnDaskFactory`` -> ``ExperimentalPandasOnDaskFactory``
- another IO class ``PandasOnDaskIO`` -> ``ExperimentalPandasOnDaskIO``

ExperimentalPandasOnDaskIO classes and modules
----------------------------------------------

- :py:class:`~modin.experimental.core.execution.dask.implementations.pandas_on_dask.io.io.ExperimentalPandasOnDaskIO`
- :py:class:`~modin.core.execution.dispatching.factories.factories.ExperimentalPandasOnDaskFactory`
- :py:class:`~modin.experimental.core.io.text.csv_glob_dispatcher.ExperimentalCSVGlobDispatcher`
- :py:class:`~modin.experimental.core.io.sql.sql_dispatcher.ExperimentalSQLDispatcher`
- :py:class:`~modin.experimental.core.io.pickle.pickle_dispatcher.ExperimentalPickleDispatcher`
- :py:class:`~modin.experimental.core.io.text.custom_text_dispatcher.ExperimentalCustomTextDispatcher`
- :py:class:`~modin.core.storage_formats.pandas.parsers.PandasCSVGlobParser`
- :doc:`ExperimentalPandasOnDask IO module </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/io/index>`
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
:orphan:

IO module Description For ExperimentalPandasOnDask Execution
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

High-Level Module Overview
''''''''''''''''''''''''''

This module houses experimental functionality with pandas storage format and Dask
engine. This functionality is concentrated in the :py:class:`~modin.experimental.core.execution.dask.implementations.pandas_on_dask.io.io.ExperimentalPandasOnDaskIO`
class, that contains methods, which extend typical pandas API to give user
more flexibility with IO operations.

Usage Guide
'''''''''''

In order to use the experimental features, just modify standard Modin import
statement as follows:

.. code-block:: python
# import modin.pandas as pd
import modin.experimental.pandas as pd
Submodules Description
''''''''''''''''''''''

The ``modin.experimental.core.execution.dask.implementations.pandas_on_dask`` module primarily houses utils and
functions for the experimental IO class:

* ``io.py`` - submodule containing IO class and parse functions, which are responsible
for data processing on the workers.

Public API
''''''''''

.. autoclass:: modin.experimental.core.execution.dask.implementations.pandas_on_dask.io.io.ExperimentalPandasOnDaskIO
:members:
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ ExperimentalPandasOnRayIO classes and modules

- :py:class:`~modin.experimental.core.execution.ray.implementations.pandas_on_ray.io.io.ExperimentalPandasOnRayIO`
- :py:class:`~modin.core.execution.dispatching.factories.factories.ExperimentalPandasOnRayFactory`
- :py:class:`~modin.core.io.text.csv_glob_dispatcher.CSVGlobDispatcher`
- :py:class:`~modin.experimental.core.io.text.csv_glob_dispatcher.ExperimentalCSVGlobDispatcher`
- :py:class:`~modin.experimental.core.io.sql.sql_dispatcher.ExperimentalSQLDispatcher`
- :py:class:`~modin.experimental.core.io.pickle.pickle_dispatcher.ExperimentalPickleDispatcher`
- :py:class:`~modin.experimental.core.io.text.custom_text_dispatcher.ExperimentalCustomTextDispatcher`
- :py:class:`~modin.core.storage_formats.pandas.parsers.PandasCSVGlobParser`
- :doc:`ExperimentalPandasOnRay IO module </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/io/index>`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
:orphan:

IO module Description For Pandas-on-Ray Execution
"""""""""""""""""""""""""""""""""""""""""""""""""
IO module Description For ExperimentalPandasOnRay Execution
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

High-Level Module Overview
''''''''''''''''''''''''''
Expand Down Expand Up @@ -31,8 +31,6 @@ functions for the experimental IO class:
* ``io.py`` - submodule containing IO class and parse functions, which are responsible
for data processing on the workers.

* ``sql.py`` - submodule with util functions for experimental ``read_sql`` function.

Public API
''''''''''

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ ExperimentalPandasOnUnidistIO classes and modules

- :py:class:`~modin.experimental.core.execution.unidist.implementations.pandas_on_unidist.io.io.ExperimentalPandasOnUnidistIO`
- :py:class:`~modin.core.execution.dispatching.factories.factories.ExperimentalPandasOnUnidistFactory`
- :py:class:`~modin.core.io.text.csv_glob_dispatcher.CSVGlobDispatcher`
- :py:class:`~modin.experimental.core.io.text.csv_glob_dispatcher.ExperimentalCSVGlobDispatcher`
- :py:class:`~modin.experimental.core.io.sql.sql_dispatcher.ExperimentalSQLDispatcher`
- :py:class:`~modin.experimental.core.io.pickle.pickle_dispatcher.ExperimentalPickleDispatcher`
- :py:class:`~modin.experimental.core.io.text.custom_text_dispatcher.ExperimentalCustomTextDispatcher`
- :py:class:`~modin.core.storage_formats.pandas.parsers.PandasCSVGlobParser`
- :doc:`ExperimentalPandasOnUnidist IO module </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/io/index>`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
:orphan:

IO module Description For Pandas-on-Unidist Execution
"""""""""""""""""""""""""""""""""""""""""""""""""""""
IO module Description For ExperimentalPandasOnUnidist Execution
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

High-Level Module Overview
''''''''''''''''''''''''''
Expand Down Expand Up @@ -31,8 +31,6 @@ functions for the experimental IO class:
* ``io.py`` - submodule containing IO class and parse functions, which are responsible
for data processing on the workers.

* ``sql.py`` - submodule with util functions for experimental ``read_sql`` function.

Public API
''''''''''

Expand Down
29 changes: 29 additions & 0 deletions docs/flow/modin/experimental/core/io/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
:orphan:

Experimental IO Module Description
""""""""""""""""""""""""""""""""""

The module is used mostly for storing experimental utils and
dispatcher classes for reading/writing files of different formats.

Submodules Description
''''''''''''''''''''''

* text - directory for storing all text file format dispatcher classes

* format/feature specific dispatchers: ``csv_glob_dispatcher.py``,
``custom_text_dispatcher.py``.

* sql - directory for storing SQL dispatcher class

* format/feature specific dispatchers: ``sql_dispatcher.py``

* pickle - directory for storing Pickle dispatcher class

* format/feature specific dispatchers: ``pickle_dispatcher.py``

Public API
''''''''''

.. automodule:: modin.experimental.core.io
:members:
6 changes: 0 additions & 6 deletions modin/core/io/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,8 @@

from .io import BaseIO
from .text.csv_dispatcher import CSVDispatcher
from .text.csv_glob_dispatcher import CSVGlobDispatcher
from .text.fwf_dispatcher import FWFDispatcher
from .text.json_dispatcher import JSONDispatcher
from .text.custom_text_dispatcher import (
ExperimentalCustomTextDispatcher,
)
from .text.excel_dispatcher import ExcelDispatcher
from .file_dispatcher import FileDispatcher
from .text.text_file_dispatcher import TextFileDispatcher
Expand All @@ -32,7 +28,6 @@
__all__ = [
"BaseIO",
"CSVDispatcher",
"CSVGlobDispatcher",
"FWFDispatcher",
"JSONDispatcher",
"FileDispatcher",
Expand All @@ -42,5 +37,4 @@
"FeatherDispatcher",
"SQLDispatcher",
"ExcelDispatcher",
"ExperimentalCustomTextDispatcher",
]
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,11 @@
)
from modin.core.storage_formats.pandas.query_compiler import PandasQueryCompiler
from modin.core.execution.dask.implementations.pandas_on_dask.io import PandasOnDaskIO
from modin.core.io import (
CSVGlobDispatcher,
ExperimentalCustomTextDispatcher,
)
from modin.experimental.core.io import (
ExperimentalCSVGlobDispatcher,
ExperimentalSQLDispatcher,
ExperimentalPickleDispatcher,
ExperimentalCustomTextDispatcher,
)

from modin.core.execution.dask.implementations.pandas_on_dask.dataframe import (
Expand Down Expand Up @@ -66,7 +64,7 @@ def __make_write(*classes, build_args=build_args):
# used to reduce code duplication
return type("", (DaskWrapper, *classes), build_args).write

read_csv_glob = __make_read(PandasCSVGlobParser, CSVGlobDispatcher)
read_csv_glob = __make_read(PandasCSVGlobParser, ExperimentalCSVGlobDispatcher)
read_pickle_distributed = __make_read(
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,11 @@
)
from modin.core.storage_formats.pandas.query_compiler import PandasQueryCompiler
from modin.core.execution.ray.implementations.pandas_on_ray.io import PandasOnRayIO
from modin.core.io import (
CSVGlobDispatcher,
ExperimentalCustomTextDispatcher,
)
from modin.experimental.core.io import (
ExperimentalCSVGlobDispatcher,
ExperimentalSQLDispatcher,
ExperimentalPickleDispatcher,
ExperimentalCustomTextDispatcher,
)
from modin.core.execution.ray.implementations.pandas_on_ray.dataframe import (
PandasOnRayDataframe,
Expand Down Expand Up @@ -65,7 +63,7 @@ def __make_write(*classes, build_args=build_args):
# used to reduce code duplication
return type("", (RayWrapper, *classes), build_args).write

read_csv_glob = __make_read(PandasCSVGlobParser, CSVGlobDispatcher)
read_csv_glob = __make_read(PandasCSVGlobParser, ExperimentalCSVGlobDispatcher)
read_pickle_distributed = __make_read(
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,11 @@
from modin.core.execution.unidist.implementations.pandas_on_unidist.io import (
PandasOnUnidistIO,
)
from modin.core.io import (
CSVGlobDispatcher,
ExperimentalCustomTextDispatcher,
)
from modin.experimental.core.io import (
ExperimentalCSVGlobDispatcher,
ExperimentalSQLDispatcher,
ExperimentalPickleDispatcher,
ExperimentalCustomTextDispatcher,
)
from modin.core.execution.unidist.implementations.pandas_on_unidist.dataframe import (
PandasOnUnidistDataframe,
Expand Down Expand Up @@ -67,7 +65,7 @@ def __make_write(*classes, build_args=build_args):
# used to reduce code duplication
return type("", (UnidistWrapper, *classes), build_args).write

read_csv_glob = __make_read(PandasCSVGlobParser, CSVGlobDispatcher)
read_csv_glob = __make_read(PandasCSVGlobParser, ExperimentalCSVGlobDispatcher)
read_pickle_distributed = __make_read(
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher
)
Expand Down
4 changes: 4 additions & 0 deletions modin/experimental/core/io/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,14 @@

"""Experimental IO functions implementations."""

from .text.csv_glob_dispatcher import ExperimentalCSVGlobDispatcher
from .sql.sql_dispatcher import ExperimentalSQLDispatcher
from .pickle.pickle_dispatcher import ExperimentalPickleDispatcher
from .text.custom_text_dispatcher import ExperimentalCustomTextDispatcher

__all__ = [
"ExperimentalCSVGlobDispatcher",
"ExperimentalSQLDispatcher",
"ExperimentalPickleDispatcher",
"ExperimentalCustomTextDispatcher",
]
14 changes: 14 additions & 0 deletions modin/experimental/core/io/text/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Licensed to Modin Development Team under one or more contributor license agreements.
# See the NOTICE file distributed with this work for additional information regarding
# copyright ownership. The Modin Development Team licenses this file to you under the
# Apache License, Version 2.0 (the "License"); you may not use this file except in
# compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under
# the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific language
# governing permissions and limitations under the License.

"""Experimental text format type IO functions implementations."""
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# ANY KIND, either express or implied. See the License for the specific language
# governing permissions and limitations under the License.

"""Module houses `CSVGlobDispatcher` class, that is used for reading multiple `.csv` files simultaneously."""
"""Module houses `ExperimentalCSVGlobDispatcher` class, that is used for reading multiple `.csv` files simultaneously."""

from contextlib import ExitStack
import csv
Expand All @@ -30,7 +30,7 @@
from modin.core.io.text.csv_dispatcher import CSVDispatcher


class CSVGlobDispatcher(CSVDispatcher):
class ExperimentalCSVGlobDispatcher(CSVDispatcher):
"""Class contains utils for reading multiple `.csv` files simultaneously."""

@classmethod
Expand Down

0 comments on commit bbc7697

Please sign in to comment.