Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ray 2.10.0 breaks import of awswrangler #2740

Closed
tleonhardt opened this issue Mar 22, 2024 · 1 comment
Closed

ray 2.10.0 breaks import of awswrangler #2740

tleonhardt opened this issue Mar 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tleonhardt
Copy link

Describe the bug

ray 2.10 was released on March 21. Using this with awswrangler 3.7.1 causes awswrangler to fail to import due to use of a deprecated functionality.

How to Reproduce

import awswrangler as wr

Output:

---------------------------------------------------------------------------
DeprecationWarning                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 import awswrangler as wr

File [/usr/local/lib/python3.9/site-packages/awswrangler/__init__.py:40](/usr/local/lib/python3.9/site-packages/awswrangler/__init__.py#line=39)
     37 from awswrangler._config import config
     38 from awswrangler._distributed import EngineEnum, MemoryFormatEnum, engine, memory_format
---> 40 engine.register()
     42 __all__ = [
     43     "athena",
     44     "catalog",
   (...)
     75     "MemoryFormatEnum",
     76 ]
     79 _logging.getLogger("awswrangler").addHandler(_logging.NullHandler())

File [/usr/local/lib/python3.9/site-packages/awswrangler/_distributed.py:121](/usr/local/lib/python3.9/site-packages/awswrangler/_distributed.py#line=120), in Engine.register(cls, name)
    118 if engine_name == EngineEnum.RAY.value:
    119     from awswrangler.distributed.ray._register import register_ray
--> 121     register_ray()

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/_register.py:73](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/_register.py#line=72), in register_ray()
     66 from awswrangler.distributed.ray.modin._data_types import pyarrow_types_from_pandas_distributed
     67 from awswrangler.distributed.ray.modin._utils import (
     68     _arrow_refs_to_df,
     69     _copy_modin_df_shallow,
     70     _is_pandas_or_modin_frame,
     71     _split_modin_frame,
     72 )
---> 73 from awswrangler.distributed.ray.modin.s3._read_orc import _read_orc_distributed
     74 from awswrangler.distributed.ray.modin.s3._read_parquet import _read_parquet_distributed
     75 from awswrangler.distributed.ray.modin.s3._read_text import _read_text_distributed

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/modin/s3/_read_orc.py:13](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/modin/s3/_read_orc.py#line=12)
     10 from ray.data.datasource import FastFileMetadataProvider
     12 from awswrangler import _data_types
---> 13 from awswrangler.distributed.ray.datasources import ArrowORCDatasource
     14 from awswrangler.distributed.ray.modin._utils import _to_modin
     16 if TYPE_CHECKING:

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/__init__.py:3](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/__init__.py#line=2)
      1 """Ray Datasources Module."""
----> 3 from awswrangler.distributed.ray.datasources.arrow_csv_datasink import ArrowCSVDatasink
      4 from awswrangler.distributed.ray.datasources.arrow_csv_datasource import ArrowCSVDatasource
      5 from awswrangler.distributed.ray.datasources.arrow_json_datasource import ArrowJSONDatasource

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/arrow_csv_datasink.py:12](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/arrow_csv_datasink.py#line=11)
      9 from ray.data.block import BlockAccessor
     10 from ray.data.datasource.block_path_provider import BlockWritePathProvider
---> 12 from awswrangler.distributed.ray.datasources.file_datasink import _BlockFileDatasink
     14 _logger: logging.Logger = logging.getLogger(__name__)
     17 class ArrowCSVDatasink(_BlockFileDatasink):

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/file_datasink.py:22](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/file_datasink.py#line=21)
     17 from awswrangler.s3._write import _COMPRESSION_2_EXT
     19 _logger: logging.Logger = logging.getLogger(__name__)
---> 22 class _BlockFileDatasink(Datasink):
     23     def __init__(
     24         self,
     25         path: str,
   (...)
     32         **write_args: Any,
     33     ):
     34         self.path = path

File [/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/file_datasink.py:28](/usr/local/lib/python3.9/site-packages/awswrangler/distributed/ray/datasources/file_datasink.py#line=27), in _BlockFileDatasink()
     22 class _BlockFileDatasink(Datasink):
     23     def __init__(
     24         self,
     25         path: str,
     26         file_format: str,
     27         *,
---> 28         block_path_provider: BlockWritePathProvider | None = DefaultBlockWritePathProvider(),
     29         dataset_uuid: str | None = None,
     30         open_s3_object_args: dict[str, Any] | None = None,
     31         pandas_kwargs: dict[str, Any] | None = None,
     32         **write_args: Any,
     33     ):
     34         self.path = path
     35         self.file_format = file_format

File [/usr/local/lib64/python3.9/site-packages/ray/data/datasource/block_path_provider.py:14](/usr/local/lib64/python3.9/site-packages/ray/data/datasource/block_path_provider.py#line=13), in BlockWritePathProvider.__init__(self)
     13 def __init__(self) -> None:
---> 14     raise DeprecationWarning(
     15         "`BlockWritePathProvider` has been deprecated in favor of "
     16         "`FilenameProvider`. For more information, see "
     17         "https://docs.ray.io/en/master/data/api/doc/ray.data.datasource.FilenameProvider.html",  # noqa: E501
     18     )

DeprecationWarning: `BlockWritePathProvider` has been deprecated in favor of `FilenameProvider`. For more information, see https://docs.ray.io/en/master/data/api/doc/ray.data.datasource.FilenameProvider.html

Click to add a cell.

Expected behavior

Importing awswrangler still works

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.9

AWS SDK for pandas version

3.7.1

Additional context

No response

@tleonhardt tleonhardt added the bug Something isn't working label Mar 22, 2024
@kukushking
Copy link
Contributor

Thanks @tleonhardt looks like BlockWritePathProvider was deprecated in favour of FilenameProvider in Ray 2.10. Working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants