Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] support MapType in JSON parsing by recursively parsing child columns. #9450

Open
revans2 opened this issue Oct 16, 2023 · 0 comments
Open
Labels
feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Oct 16, 2023

Is your feature request related to a problem? Please describe.
So to be clear this is not something that has to be done right away. This is here mostly to document the plan on how we could support Map types when parsing JSON, including from_json when it is needed.

We currently have some support for parsing JSON to a Map<String,String> at the top level from from_json. CUDF is adding in support for returning a complex JSON object as a string rapidsai/cudf#14239. We have filed an issue with CUDF to try and get MAP functionality directly into CUDF too rapidsai/cudf#14288. The idea here is to do something similar to what we described in the MAP issue for CUDF, but instead of having CUDF try and parse a MAP as a part of parsing other types we will ask CUDF to just return a string for any Map type. Then we can use from_json to parse the text into a Map<String,String>. Once we have that we can again recursively parse keys and values into whatever type we need to match what Spark is doing.

It is not likely to be super fast, but it should allow us to do the processing on the GPU and avoid going back to the CPU to do any of the parsing. It should also hopefully be fast enough, but we should benchmark any change that we do. If we see that the performance is horrible for deeply nested maps we might want to fall back to the CPU in those cases.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Oct 16, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants