-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add json reader support #4485
Add json reader support #4485
Conversation
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
build |
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is looking good. My main concerns at this point is adding some documentation to docs/compatibility.md. Even if that is just that this is a very experimental feature and has a lot of problems. More specifics about what does not work is better.
I also wanted to make sure that @tgravescs and @nartal1 are okay with JSON reads not being flagged in the qualification tool any more. We generate that file automatically, but I am not 100% sure with our current level of support that we want JSON to be on the truly supported list.
since the supported data sources file has it as CO the qualification tool should still report it as not supported properly, but we should test to make sure @nartal1 can you test? |
sorry, I didn't fully look at the PR, looks like the qualification results were update, only diff is it reports each type. They are properly listed under not supported still: |
Thanks @tgravescs for looking into it. Yes, Instead of json[*], it's now reporting the types which are not supported which is good. |
sql-plugin/src/main/scala/org/apache/spark/sql/catalyst/json/rapids/GpuJsonScan.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/catalyst/json/rapids/GpuJsonScan.scala
Outdated
Show resolved
Hide resolved
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments about the docs.
build |
build |
|
||
### JSON supporting types | ||
|
||
The nested types(array, map and struct) are not supported yet in current version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that it is not possible to access primitive types stored within an array, map, and struct, or just that we cannot return a column of type array, map, or struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that it is not possible to access primitive types stored within an array, map, and struct,
Yes
or just that we cannot return a column of type array, map, or struct?
Spark does not treat them differently as far as I can tell. CUDF sort of does but not in any way that is compatible with what Spark does and it is only for top level arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I would like others to look at this too
@tgravescs Could you help to review this PR ? |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTextBasedPartitionReader.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/catalyst/json/rapids/GpuJsonScan.scala
Outdated
Show resolved
Hide resolved
build |
This PR is trying to support the json reader on basic type support.