Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable the struct support for the ORC reader. #3262

Merged
merged 6 commits into from
Aug 27, 2021

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Aug 20, 2021

It adds the struct support in ORC reader, along with the tests for the nested predicate pushdown, and the support for nested column pruning.

fixes #2879
fixes #1481
fixes #463

The old relevant PRs:
#3079
#2887

Signed-off-by: Firestarman firestarmanllc@gmail.com

Also add tests for the nested predicate pushdown, and
the support for nested column pruning.

Relevant PRs:
  NVIDIA#3079
  NVIDIA#2887

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Aug 20, 2021
@firestarman
Copy link
Collaborator Author

There is still an issue in CUDF for a coner case when reading a struct column, and here is the fix rapidsai/cudf#9060.

integration_tests/src/main/python/orc_test.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/orc_test.py Outdated Show resolved Hide resolved
integration_tests/src/main/python/orc_test.py Outdated Show resolved Hide resolved
@firestarman firestarman marked this pull request as draft August 23, 2021 02:15
@firestarman
Copy link
Collaborator Author

Move to draft because of the open blocking issue rapidsai/cudf#9060.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
signed-off-by: Firestarman <firestarmanllc@gmail.com>
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just waiting on the cudf issue.

@firestarman firestarman marked this pull request as ready for review August 25, 2021 03:20
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

The fix rapidsai/cudf#9060 has been merged.

@firestarman firestarman requested review from jlowe and wbo4958 August 25, 2021 05:28
@firestarman firestarman linked an issue Aug 25, 2021 that may be closed by this pull request
jlowe
jlowe previously approved these changes Aug 25, 2021
revans2
revans2 previously approved these changes Aug 25, 2021
@revans2
Copy link
Collaborator

revans2 commented Aug 25, 2021

All the code looks fine, just a few minor nits that I can live without.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman dismissed stale reviews from revans2 and jlowe via e336429 August 26, 2021 04:46
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

All the code looks fine, just a few minor nits that I can live without.

Thanks for reivew. I decided to address the nits in this PR because there is one for the code change.

@firestarman firestarman requested a review from revans2 August 26, 2021 04:52
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nit

Comment on lines +1228 to +1234
fileSchema.getCategory == readSchema.getCategory && {
if (readSchema.getChildren != null) {
readSchema.getChildren.asScala.forall(rc =>
fileSchema.getChildren.asScala.exists(fc => isSchemaCompatible(fc, rc)))
} else {
false
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
fileSchema.getCategory == readSchema.getCategory && {
if (readSchema.getChildren != null) {
readSchema.getChildren.asScala.forall(rc =>
fileSchema.getChildren.asScala.exists(fc => isSchemaCompatible(fc, rc)))
} else {
false
}
fileSchema.getCategory == readSchema.getCategory &&
readSchema.getChildren != null) &&
readSchema.getChildren.asScala.forall(rc =>
fileSchema.getChildren.asScala.exists(fc => isSchemaCompatible(fc, rc)))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reivew. I will update this in antoher PR.

@firestarman firestarman merged commit c5ab2d6 into NVIDIA:branch-21.10 Aug 27, 2021
@firestarman firestarman deleted the orc-read branch August 27, 2021 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
4 participants