Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

Merged
merged 9 commits into from
Jan 12, 2022

Conversation

sperlingxx
Copy link
Collaborator

Signed-off-by: sperlingxx lovedreamf@gmail.com

Fixes #4273

Current PR is to fix up integral values rounded by a scale exceeding/reaching the max digits of data type. Under this circumstance, cuDF may produce different results to Spark. The general strategy applied in this PR:

  • For scales exceeding max digits, we can simply return zero values.
  • For scales equaling to max digits, we need to perform round. Fortunately, round up will NOT occur on the max digits of numeric types except LongType. Therefore, we only need to handle round down for most of types, through returning zero values. For LongType rounded by scale -19, we handle it in a specialized way.

@sperlingxx sperlingxx changed the title Fix up incorrect results of rounding past the max digits of data type Fix up incorrect results of rounding past the max digits of data type [databricks] Dec 22, 2021
@sperlingxx
Copy link
Collaborator Author

build

// For scales equaling to max digits, we need to perform round. Fortunately, round up
// will NOT occur on the max digits of numeric types except LongType. Therefore, we only
// need to handle round down for most of types, through returning zero values.
def fixUpOverflowInts(zeroFn: () => Scalar): ColumnVector = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer defining these functions out of this doColumnar to making them as the internal functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Since LongType can not hold these two values, the 1e19 overflows as -8446744073709551616L,
// and the -1e19 overflows as 8446744073709551616L. The overflow happens in the same way for
// HALF_UP (round) and HALF_EVEN (bround).
def fixUpInt64OnBounds(zeroFn: () => Scalar): ColumnVector = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here only handles the long type, so seems this zeroFn is not necessary.

It should always be a zero column of the long type, right ?

BTW, do we really need to take care of this round up/down of long ? Can cudf round handle it the same as what you are doing here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I removed the argument zeroFn.
When rounding by scales exceeding the max digits, the results of cudf::round are inconsistent with Spark. And in terms of cuDF, it is a undefined behavior. So, technically cuDF can return any value under this situation. That's why we need to override the round of extreme large scales.

infFn: () => Scalar,
negInfFn: () => Scalar): ColumnVector = {
val scaleVal = scale.getValue.asInstanceOf[Int]
val maxDigits = if (dataType == FloatType) 39 else 309
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have a similar API to DecimalUtil.getPrecisionForIntegralType, e.g. DecimalUtil.getPrecisionForFloatType

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the 39 and 309 are numeric bounds rather than precision of float types. These numeric bounds are rarely used in spark-rapids. And they have little relation with DecimalUtil. So, I don't think we need to make it a common method.

@sameerz sameerz added the bug Something isn't working label Dec 29, 2021
sperlingxx and others added 3 commits January 10, 2022 13:40
Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Collaborator Author

build

@sperlingxx sperlingxx merged commit 460461e into NVIDIA:branch-22.02 Jan 12, 2022
@sperlingxx sperlingxx deleted the fix_round_cornor branch January 12, 2022 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Rounding past the size that can be stored in a type produces incorrect results
4 participants