You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
So to be clear this is not something that has to be done right away. This is here mostly to document the plan on how we could support Map types when parsing JSON, including from_json when it is needed.
We currently have some support for parsing JSON to a Map<String,String> at the top level from from_json. CUDF is adding in support for returning a complex JSON object as a string rapidsai/cudf#14239. We have filed an issue with CUDF to try and get MAP functionality directly into CUDF too rapidsai/cudf#14288. The idea here is to do something similar to what we described in the MAP issue for CUDF, but instead of having CUDF try and parse a MAP as a part of parsing other types we will ask CUDF to just return a string for any Map type. Then we can use from_json to parse the text into a Map<String,String>. Once we have that we can again recursively parse keys and values into whatever type we need to match what Spark is doing.
It is not likely to be super fast, but it should allow us to do the processing on the GPU and avoid going back to the CPU to do any of the parsing. It should also hopefully be fast enough, but we should benchmark any change that we do. If we see that the performance is horrible for deeply nested maps we might want to fall back to the CPU in those cases.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
So to be clear this is not something that has to be done right away. This is here mostly to document the plan on how we could support Map types when parsing JSON, including from_json when it is needed.
We currently have some support for parsing JSON to a Map<String,String> at the top level from
from_json
. CUDF is adding in support for returning a complex JSON object as a string rapidsai/cudf#14239. We have filed an issue with CUDF to try and get MAP functionality directly into CUDF too rapidsai/cudf#14288. The idea here is to do something similar to what we described in the MAP issue for CUDF, but instead of having CUDF try and parse a MAP as a part of parsing other types we will ask CUDF to just return a string for any Map type. Then we can use from_json to parse the text into a Map<String,String>. Once we have that we can again recursively parse keys and values into whatever type we need to match what Spark is doing.It is not likely to be super fast, but it should allow us to do the processing on the GPU and avoid going back to the CPU to do any of the parsing. It should also hopefully be fast enough, but we should benchmark any change that we do. If we see that the performance is horrible for deeply nested maps we might want to fall back to the CPU in those cases.
The text was updated successfully, but these errors were encountered: