-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GPU JSON reader fails to read the JSON string of an empty body #7616
Comments
We should switch to the new JSON reader per issue #7518 |
I just re-tested this, and it is still an issue even after switching to the new engine.
|
This only seems to be an issue for a JSON file that only contains empty entries. If there is at least one non-empty row, then we match Spark.
|
@res-life are you still planning on working on this? The failures are happening in two places. If you don't provide a schema, then schema discovery returns with an empty schema. CUDF does not like this so we try to make one up, and try to pull something out of the dataSchema, which is also empty and results in a crash. If we do provide a schema, then we run into a null pointer exception when trying to read the data.
We should not be trying to use the data schema if the read data schema is empty. That might result in us reading in the wrong data if it actually succeeded, because the only time that readDataSchema is empty but data schema is not is if we have partition columns. In the short term I think we just need to fall back to the CPU if the readDataSchema is empty, and we should concentrate on fixing the null pointer exception. |
No, I'm now focusing on |
Describe the bug
GPU JSON reader can not read the JSON string of an empty body
{}
. But Spark can read it successfully.Steps/Code to reproduce bug
There are two sub cases, and GPU read will fail due to different errors.
Expected behavior
GPU JSON reader should handle it as what Spark does.
Additional context
cudf Python has fixed the second sub issue by switching the JSON engine to the new reader, so JNI should also make the same switch when creating the read option.
We need to test it well to make sure no regression will be introduced by this new JSON reader.
After fixing this, we need to enable the tests xfailed in #7447.
The text was updated successfully, but these errors were encountered: