-
Notifications
You must be signed in to change notification settings - Fork 916
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
JSON - Parse mixed types as string in JSON reader (#14572)
Addresses #14239 This PR adds an option to read mixed types as string columns. It also adds related functional changes to nested JSON reader (libcudf, cuDF-python, Java). Details: - Added new option `mixed_types_as_string` bool in json_reader_options - This feature requires 2 things: finding end of struct/list nodes, parse struct/list type as string. - For Struct and List, node_range_end was node_range_begin+1 earlier (since it was not used anywhere). Now it is calculated properly by copying only struct and list tokens and their node_range_end is calculated. (Since end token is child of begin token, scattering end token's index to parent' token's corresponding node's node_range_end will get the node_range_end of List and Struct nodes). - In `reduce_to_column_tree()` (which infers the schema), the list and struct node_range_end are changed to node_begin+1 so that it does not copy entire list/struct strings to host for column names. - `reinitialize_as_string` reinitializes an initialized column as string. - Mixed type columns are parsed as strings since their column category is changed to `NC_STR`. - Added tests Authors: - Karthikeyan (https://github.com/karthikeyann) - Andy Grove (https://github.com/andygrove) Approvers: - Andy Grove (https://github.com/andygrove) - Jason Lowe (https://github.com/jlowe) - Elias Stehle (https://github.com/elstehle) - Bradley Dice (https://github.com/bdice) - Shruti Shivakumar (https://github.com/shrshi) URL: #14572
- Loading branch information
1 parent
d1c0e25
commit 8fdc62b
Showing
15 changed files
with
472 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.