Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] JSON number normalization when returned as a string #15318

Open
Tracked by #4609 ...
revans2 opened this issue Mar 15, 2024 · 1 comment
Open
Tracked by #4609 ...

[FEA] JSON number normalization when returned as a string #15318

revans2 opened this issue Mar 15, 2024 · 1 comment
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Mar 15, 2024

Is your feature request related to a problem? Please describe.
I am filing this to capture what Spark does, but it feels very Spark specific, which could be problematic. The solution here might be related to a solution to #15222 so we don't cause too much performance impact to others.

NVIDIA/spark-rapids#10458 is the corresponding issue in the Spark Plugin and NVIDIA/spark-rapids#10218 is related to it.

I don't really know 100% the solution I would like. This is where it gets to be kind of ugly/difficult.

When Spark processes JSON it parses the JSON into tokens, and then converts that back to a String when it is done. This results in things like numbers being converted to integers, doubles or java BigDecimal values, and then converted back to a String. For integers and BigDecimal values (numbers that do not include a decimal point or scientific notation) The processing is mostly a noop.

-0 becomes just 0. If there are any leading zeros on the number, then they are removed (but only if validations didn't already mark that as a problem #15222)

For floating point numbers it is more complicated, and I need to get some more specifics to put in here. The hard part is detecting overflow and converting the number to +/- Infinity. Conversion from scientific notation to regular floating point notation and back. Then there is also making sure that the number fits the actual floating point notation.

I almost want to have a way for me to provide my own code for this to happen, but I'm not sure if there is any good way to do that, because I am nervous that Spark will change some of these things over time.

@ttnghia
Copy link
Contributor

ttnghia commented Mar 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
Status: In Progress
Development

No branches or pull requests

2 participants