Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add tests to getJSONObject for queries involving single quotes #10476

Conversation

SurajAralihalli
Copy link
Collaborator

@SurajAralihalli SurajAralihalli commented Feb 22, 2024

Depends on PR 15082

Cudf PR 15082 fixes issues with get_json_object when single quotes are used in the JSONPath query. Additionally, it now throws a std::invalid_argument error for invalid queries. The JNI function catches this exception and produces a string column consisting of null values.

This PR verifies the output of get_json_object between the cudf implementation and the standard Spark version.

For example in { "A.B": 2, "'A": { "B'": 3 } }

JSONPath Before CUDF Fix Spark After Fix
$.'A.B' 2 3 3
$.'A CUDF_FAIL("Encountered invalid JSONPath input string") {"B'": 3} {"B'": 3}
${A} CUDF_FAIL("Unrecognized JSONPath operator") null null

Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
@SurajAralihalli SurajAralihalli added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 22, 2024
@sameerz sameerz added the bug Something isn't working label Feb 24, 2024
Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
@revans2
Copy link
Collaborator

revans2 commented Feb 27, 2024

How is this different from #10466? If we are fixing some things in CUDF, that is great, but I think the other PR has duplicated a lot of the work that would be needed here.

Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
@SurajAralihalli SurajAralihalli force-pushed the get_json_fix_invalid_queries branch from 2ba05ce to da352e8 Compare February 28, 2024 01:21
@SurajAralihalli SurajAralihalli changed the title [WIP] Address issues with getJSONObject for queries involving single quotes and invalid syntax [WIP] Address issues with getJSONObject for queries involving single quotes Feb 28, 2024
@SurajAralihalli SurajAralihalli changed the title [WIP] Address issues with getJSONObject for queries involving single quotes [WIP] Add tests to getJSONObject for queries involving single quotes Feb 28, 2024
@SurajAralihalli
Copy link
Collaborator Author

How is this different from #10466? If we are fixing some things in CUDF, that is great, but I think the other PR has duplicated a lot of the work that would be needed here.

Thanks for pointing it out! I have limited the scope of this PR to only add tests for the valid queries that include single quotes and currently fail due to the bugs in cudf. I believe it is not covered in the scope of #10466 as it examines invalid queries.

@revans2
Copy link
Collaborator

revans2 commented Feb 28, 2024

Sounds good thanks for adding these tests.

@sameerz sameerz added test Only impacts tests and removed bug Something isn't working labels Mar 4, 2024
Copy link
Collaborator

@thirtiseven thirtiseven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a cudf_dependency now. Will be good to add this case.

f.get_json_object('jsonStr',"$.A.B").alias('sub_d'),
f.get_json_object('jsonStr',"$.'A").alias('sub_e')
),
conf={'spark.rapids.sql.expression.GetJsonObject': 'true'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This config is now true by default so it's safe to remove it here.

@thirtiseven thirtiseven removed the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants