[BUG] GetJsonObject does not validate the input is JSON in the same way as Spark #10194
Labels
bug
Something isn't working
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
Describe the bug
The current GPU implementation of GetJsonObject does not check if the JSON data is valid. The CPU version uses a JSON parser that allows single quotes and unescaped control characters.
https://github.com/apache/spark/blob/a3266b411723310ec10fc1843ddababc15249db0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L108-L114
If there are any errors when parsing the data then the result is converted to a null.
https://github.com/apache/spark/blob/a3266b411723310ec10fc1843ddababc15249db0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L267
We probably need to do at least some validation that the data is correct.
Steps/Code to reproduce bug
Expected behavior
We produce the same results as Spark on the CPU.
The text was updated successfully, but these errors were encountered: