-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support nested column pruning in ORC reader when reading a struct column. #8848
Comments
You can get the test file in issue #8704. |
CC: @rgsl888prabhu @vuule I think we also need to support this for parquet reader to address this #7248 (comment). |
One more thing. I hope cuDF can support the output columns having the same order with the column names in parameter |
Actually that's easier to support. It's what we do for non-nested column selection. |
Closes #8848 - Allows caller to specify nested column paths, so that the fields not listed in the `columns` parameter are excluded. - The order of fields/columns in the output table is consistent with the order of paths/names in the `columns` parameter. - Moved `aggregate_orc_metadata` implementation to a separate file (can be `.cpp`!) - Add tests to cover different cases with a mix of nested and parent columns selection. - changed a few fields from `uint32_t` to `int32_t` to avoid unsigned arithmetic. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) URL: #9496
Assuming there is an orc file containing one struct column "_c0" as below.
We can read only one nested column by Pandas orc reader, e.g
or two of the nested columns,
However cudf will complain an error for all the cases above.
Depends on #7830
The text was updated successfully, but these errors were encountered: