Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading ORC file with no row group index #9060

Merged
merged 30 commits into from
Aug 24, 2021

Conversation

rgsl888prabhu
Copy link
Contributor

The ORC reader in cuIO was designed thinking row group index is always available, which resulted in the failure.
Changes have been made to read ORC files even in case group index stream is not available.

closes #8878

@rgsl888prabhu rgsl888prabhu requested review from a team as code owners August 18, 2021 07:11
@rgsl888prabhu rgsl888prabhu self-assigned this Aug 18, 2021
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Aug 18, 2021
@rgsl888prabhu rgsl888prabhu removed the libcudf Affects libcudf (C++/CUDA) code. label Aug 18, 2021
@vuule
Copy link
Contributor

vuule commented Aug 18, 2021

@rgsl888prabhu , test
cudf.tests.test_orc.test_no_row_group_index_orc_read[TestOrcFile.NoIndStrm.StructWithNoNulls.orc]
is failing with
RuntimeError: cuDF failure at: ../src/column/column_view.cpp:59: Invalid null mask for non-zero null count.
Looks like we're trying to make a column with no null mask and a positive null count.

@rgsl888prabhu
Copy link
Contributor Author

rerun tests

@codecov
Copy link

codecov bot commented Aug 19, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@417b34d). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9060   +/-   ##
===============================================
  Coverage                ?   10.73%           
===============================================
  Files                   ?      114           
  Lines                   ?    19058           
  Branches                ?        0           
===============================================
  Hits                    ?     2046           
  Misses                  ?    17012           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 417b34d...3c51111. Read the comment docs.

Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks only. Feel free to ignore

cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/column/column_view.cpp Outdated Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
@rgsl888prabhu
Copy link
Contributor Author

rerun tests

@rgsl888prabhu
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of dubious suggestions :)

cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Show resolved Hide resolved
cpp/src/io/orc/reader_impl.cu Outdated Show resolved Hide resolved
@vuule vuule requested a review from nvdbaranec August 23, 2021 22:38
@vuule
Copy link
Contributor

vuule commented Aug 24, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a153493 into rapidsai:branch-21.10 Aug 24, 2021
firestarman pushed a commit to firestarman/cudf that referenced this pull request Sep 1, 2021
The ORC reader in cuIO was designed thinking row group index is always available, which resulted in the failure.
Changes have been made to read ORC files even in case group index stream is not available. 

closes rapidsai#8878

Authors:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Devavret Makkar (https://github.com/devavret)
  - Vukasin Milovanovic (https://github.com/vuule)
  - https://github.com/nvdbaranec

URL: rapidsai#9060
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cudf reads orc file failed.
6 participants