Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt JSONUtils.concatenateJsonStrings for concatenating JSON strings #11549

Open
wants to merge 19 commits into
base: branch-24.12
Choose a base branch
from

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Sep 30, 2024

This adopts the newly implemented JSONUtils.concatenateJsonStrings from spark-rapids-jni for concatenating JSON strings into one single string for reading using cudf's JSON reader.

Depends on:

This will also closes #10922.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonReadCommon.scala
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonReadCommon.scala
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia ttnghia added the performance A performance related task/issue label Sep 30, 2024
@ttnghia ttnghia self-assigned this Sep 30, 2024
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>

@allow_non_gpu(*non_utc_allow)
def test_from_json_input_wrapped_in_whitespaces():
json_string_gen = StringGen(r'[ \r\n\t]{0,5}({"key":( |\r|\n|\t|)"[A-z]{0,5}"}|null|invalid|)[ \r\n\t]{0,5}')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will generate text that is either:

  • '{"key":( |\r|\n|\t|)"[A-z]{0,5}"}'
  • 'null'
  • 'invalid'
  • Empty string

And each of these strings is surrounded by whitespace chars [ \r\n\t]{0,5}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 11, 2024

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 14, 2024

build

@ttnghia ttnghia requested a review from revans2 October 14, 2024 20:22
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
@ttnghia
Copy link
Collaborator Author

ttnghia commented Oct 15, 2024

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

from_json cannot support line separator in the input string.
2 participants