-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA][JSON reader] to support parsing with single quotes #10004
Comments
This issue has been labeled |
By default Spark allows single quotes by default so this is a blocker for us to enable JSON parsing in Spark by default. |
Update: our approach here will be to introduce a quote-normalizing preprocessing step based on a new finite state transducer. Also see #13525 |
The goal of this PR is to address [PR 10004](#10004) by supporting parsing of JSON files containing single quotes for field/value strings. Authors: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) - Elias Stehle (https://github.com/elstehle) URL: #14545
The goal of this PR is to address [10004](#10004) by supporting parsing of JSON files containing single quotes for field/value strings. This is a follow-up work to the POC [PR 14545](#14545) Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Andy Grove (https://github.com/andygrove) - Vyas Ramasubramani (https://github.com/vyasr) - Vukasin Milovanovic (https://github.com/vuule) - Elias Stehle (https://github.com/elstehle) - Robert (Bobby) Evans (https://github.com/revans2) URL: #14729
The goal of this PR is to address [10004](rapidsai#10004) by supporting parsing of JSON files containing single quotes for field/value strings. This is a follow-up work to the POC [PR 14545](rapidsai#14545) Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Andy Grove (https://github.com/andygrove) - Vyas Ramasubramani (https://github.com/vyasr) - Vukasin Milovanovic (https://github.com/vuule) - Elias Stehle (https://github.com/elstehle) - Robert (Bobby) Evans (https://github.com/revans2) URL: rapidsai#14729
I think this is done now. @GregoryKimball @andygrove do you both agree? |
This is part of FEA of NVIDIA/spark-rapids#9
We have a JSON file
{'name': 'Reynold Xin'}
CUDF can't parse this file because of
ai.rapids.cudf.CudfException: cuDF failure at: /home/bobwang/work.d/nvspark/cudf/cpp/src/io/json/reader_impl.cu:609: Error determining column names.
Spark can parse it by default.
We expect there is a configure
allowSingleQuotes
to control this behavior.The text was updated successfully, but these errors were encountered: