[BUG] GPU get_json_object does incompatible escaping and error checking #12483
Labels
0 - Backlog
In queue waiting for assignment
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
Describe the bug
the GPU implementation of get_json_object parses the JSON path in a way that is incompatible with Spark's and is not documented. We also throw a bunch of exceptions on invalid JSON paths that are not documented. Some of these I am fine if we just document the incompatibility. For others we might want to look into fixing them...
Steps/Code to reproduce bug
expected:
Actual (on the GPU)
Expected behavior
At a minimum we need to document these differences. Ideally we catch any parsing errors from the JSON path parser and we return a null in those cases just like Spark does. We might even want to look into falling back to the CPU if we see a single quote
'
in the path.The single quote escaping, at least for spark, appears to only happen inside of
[]
operations, like$['A.B']
, which returns 2 for both the GPU and the CPU.The text was updated successfully, but these errors were encountered: