Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4658: Expand exception handling for read_* functions from s3 storages #4659

Merged
merged 3 commits into from
Jul 21, 2022

Conversation

prutskov
Copy link
Contributor

@prutskov prutskov commented Jul 8, 2022

Signed-off-by: Alexey Prutskov lehaprutskov@gmail.com

What do these changes do?

The PR expand the set of handled exceptions in read_* functions from s3 storages by EndpointConnectionError exception

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves read_csv throws error during reading from s3 #4658
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

@prutskov prutskov requested a review from a team as a code owner July 8, 2022 15:29
@prutskov prutskov changed the title FIX-#4658: Expand exception handling for read_csv from s3 storages FIX-#4658: Expand exception handling for read_* functions from s3 storages Jul 8, 2022
@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

Merging #4659 (9d7ba13) into master (da7bd1a) will increase coverage by 3.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4659      +/-   ##
==========================================
+ Coverage   86.56%   89.68%   +3.11%     
==========================================
  Files         230      231       +1     
  Lines       18581    18857     +276     
==========================================
+ Hits        16085    16911     +826     
+ Misses       2496     1946     -550     
Impacted Files Coverage Δ
modin/core/io/file_dispatcher.py 94.44% <100.00%> (ø)
modin/core/io/text/csv_glob_dispatcher.py 84.79% <100.00%> (ø)
modin/experimental/batch/test/test_pipeline.py 100.00% <0.00%> (ø)
...ns/pandas_on_ray/partitioning/partition_manager.py 82.19% <0.00%> (+1.36%) ⬆️
...tations/pandas_on_python/partitioning/partition.py 93.75% <0.00%> (+2.08%) ⬆️
modin/pandas/__init__.py 69.69% <0.00%> (+3.03%) ⬆️
modin/config/envvars.py 89.10% <0.00%> (+3.46%) ⬆️
...dataframe/pandas/partitioning/partition_manager.py 90.06% <0.00%> (+3.72%) ⬆️
...s/pandas_on_dask/partitioning/partition_manager.py 100.00% <0.00%> (+6.25%) ⬆️
modin/error_message.py 89.36% <0.00%> (+6.38%) ⬆️
... and 13 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@anmyachev
Copy link
Collaborator

@prutskov How does pandas handle EndpointConnectionError exception? A simple search for this exception in its source code turned up nothing.

@prutskov
Copy link
Contributor Author

@prutskov How does pandas handle EndpointConnectionError exception? A simple search for this exception in its source code turned up nothing.

I assume this exception is processed internally in fsspec module which pandas/modin use for getting of file objects. But we also use s3fs.S3FileSystem which can't handle this exception internally. That's reason why I added handling of this exception only for places of code where s3fs.S3FileSystem is used.

@prutskov
Copy link
Contributor Author

@anmyachev, I assume, that #4430 will resolve the problem because #4430 will get rid of using of s3fs.S3FileSystem and fsspec will be used.

@anmyachev
Copy link
Collaborator

@anmyachev, I assume, that #4430 will resolve the problem because #4430 will get rid of using of s3fs.S3FileSystem and fsspec will be used.

Maybe, but it's not ready for now.

mvashishtha
mvashishtha previously approved these changes Jul 18, 2022
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prutskov thanks, LGTM!

…ons from s3 storages

Signed-off-by: Alexey Prutskov <lehaprutskov@gmail.com>
mvashishtha
mvashishtha previously approved these changes Jul 21, 2022
@prutskov
Copy link
Contributor Author

@modin-project/modin-core This PR has green CI and approvements from modin-core. Probably, you can merge this

@mvashishtha mvashishtha merged commit 9e30419 into modin-project:master Jul 21, 2022
YarShev added a commit that referenced this pull request Sep 6, 2022
…torages (#4659)

Signed-off-by: Alexey Prutskov <lehaprutskov@gmail.com>

Co-authored-by: Yaroslav Igoshev <Poolliver868@mail.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv throws error during reading from s3
4 participants