Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4756: Correctly propagate storage_options in read_parquet #4764

Merged
merged 3 commits into from
Aug 4, 2022

Conversation

pyrito
Copy link
Collaborator

@pyrito pyrito commented Aug 3, 2022

Signed-off-by: Karthik Velayutham vkarthik@ponder.io

What do these changes do?

This PR addresses a bug where we do not propagate storage_options to the read_table call in build_index. This wasn't caught by any of our CI tests since we didn't stringently check for the options pass in for read_parquet. I'm working on getting access to a bucket so we can make sure that this works accordingly.

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Modin issue to cause the read_parquet file from Minio with error #4756
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

…arquet

Signed-off-by: Karthik Velayutham <vkarthik@ponder.io>
@pyrito pyrito requested a review from a team as a code owner August 3, 2022 14:30
@pyrito pyrito force-pushed the fix/FIX-4756 branch 2 times, most recently from 7c90d3d to f91864c Compare August 3, 2022 14:33
@codecov
Copy link

codecov bot commented Aug 3, 2022

Codecov Report

Merging #4764 (0def3fe) into master (ccda567) will increase coverage by 6.55%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4764      +/-   ##
==========================================
+ Coverage   83.32%   89.88%   +6.55%     
==========================================
  Files         259      260       +1     
  Lines       19339    19635     +296     
==========================================
+ Hits        16114    17648    +1534     
+ Misses       3225     1987    -1238     
Impacted Files Coverage Δ
modin/core/io/column_stores/parquet_dispatcher.py 97.14% <100.00%> (+3.76%) ⬆️
modin/logging/config.py 94.59% <0.00%> (-1.30%) ⬇️
...entations/pandas_on_dask/partitioning/partition.py 89.02% <0.00%> (-1.22%) ⬇️
modin/core/dataframe/pandas/dataframe/dataframe.py 94.33% <0.00%> (-0.58%) ⬇️
modin/experimental/batch/test/test_pipeline.py 100.00% <0.00%> (ø)
modin/pandas/series.py 94.33% <0.00%> (+0.24%) ⬆️
modin/pandas/series_utils.py 99.43% <0.00%> (+0.56%) ⬆️
modin/core/io/text/excel_dispatcher.py 94.01% <0.00%> (+0.85%) ⬆️
...ns/pandas_on_ray/partitioning/partition_manager.py 82.19% <0.00%> (+1.36%) ⬆️
...tations/pandas_on_python/partitioning/partition.py 93.75% <0.00%> (+2.08%) ⬆️
... and 42 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@pyrito pyrito force-pushed the fix/FIX-4756 branch 2 times, most recently from 718d7fe to 5baadbd Compare August 3, 2022 14:59
@YarShev
Copy link
Collaborator

YarShev commented Aug 3, 2022

@pyrito, do the changes address the issue?

@pyrito
Copy link
Collaborator Author

pyrito commented Aug 3, 2022

@YarShev yes they do address the issue. I just need to add a dataset to one of the S3 buckets hosted by the Intel folks with some storage options set so that I know that things are being passed through properly. The existing tests didn't catch this issue.

@YarShev
Copy link
Collaborator

YarShev commented Aug 3, 2022

@pyrito, cool. I left a comment, otherwise LGTM. Please let me know when the changes are ready to be merged.

Copy link
Contributor

@prutskov prutskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor suggestion

modin/core/io/column_stores/parquet_dispatcher.py Outdated Show resolved Hide resolved
@prutskov
Copy link
Contributor

prutskov commented Aug 3, 2022

@pyrito, please rebase on actual master to make Dask tests passed

Co-authored-by: Alexey Prutskov <lehaprutskov@gmail.com>
Copy link
Collaborator

@YarShev YarShev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyrito, thanks!

@YarShev YarShev merged commit 4548012 into modin-project:master Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modin issue to cause the read_parquet file from Minio with error
3 participants