Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] JSON Scan and JsonToStruct should invalidate an array on out of bounds #11491

Open
revans2 opened this issue Sep 23, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@revans2
Copy link
Collaborator

revans2 commented Sep 23, 2024

Describe the bug
CUDF is in the middle of fixing a lot of the issues with nested parsing in JSON (rapidsai/cudf#16545).

Once that goes in we can start to see that we are not covering some corner cases in JSON as fully as we would like.

When parsing numbers out of an array if any number in the array overflows, then the entire array needs to be null. We do it on a per item basis.

(1 to 20).map(upper => (1 to upper).map(i => "1" + ("0" * i)).mkString("""{"a":[""",",","]}")).toDF("json").repartition(1).selectExpr("from_json(json, 'a ARRAY<BYTE>') as a_byte", "from_json(json, 'a ARRAY<SHORT>') as a_short", "from_json(json, 'a ARRAY<INT>') as a_int", "from_json(json, 'a ARRAY<LONG>') as a_long", "from_json(json, 'a ARRAY<DECIMAL(21,0)>') as a_decimal").show(false)

This is not true for an array of structs.

(1 to 20).map(upper => (1 to upper).map(i => "1" + ("0" * i)).mkString("""{"a":[{"b":""","""},{"b":""","}]}")).toDF("json").repartition(1).selectExpr("from_json(json, 'a ARRAY<STRUCT<b:BYTE>>') as a_byte", "from_json(json, 'a ARRAY<STRUCT<b:SHORT>>') as a_short", "from_json(json, 'a ARRAY<STRUCT<b:INT>>') as a_int", "from_json(json, 'a ARRAY<STRUCT<b:LONG>>') as a_long", "from_json(json, 'a ARRAY<STRUCT<b:DECIMAL(21,0)>>') as a_decimal").show(false)

Not sure why that is, but it is.

@revans2 revans2 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Sep 23, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 24, 2024
@revans2 revans2 changed the title [BUG] JSON Scan and StructToJson should invalidate an array on out of bounds [BUG] JSON Scan and JsonToStruct should invalidate an array on out of bounds Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants