-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix up incorrect results of rounding past the max digits of data type [databricks] #4420
Conversation
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
…k-rapids into fix_round_cornor
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
build |
// For scales equaling to max digits, we need to perform round. Fortunately, round up | ||
// will NOT occur on the max digits of numeric types except LongType. Therefore, we only | ||
// need to handle round down for most of types, through returning zero values. | ||
def fixUpOverflowInts(zeroFn: () => Scalar): ColumnVector = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer defining these functions out of this doColumnar
to making them as the internal functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Since LongType can not hold these two values, the 1e19 overflows as -8446744073709551616L, | ||
// and the -1e19 overflows as 8446744073709551616L. The overflow happens in the same way for | ||
// HALF_UP (round) and HALF_EVEN (bround). | ||
def fixUpInt64OnBounds(zeroFn: () => Scalar): ColumnVector = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here only handles the long type, so seems this zeroFn
is not necessary.
It should always be a zero column of the long type, right ?
BTW, do we really need to take care of this round up/down of long ? Can cudf round
handle it the same as what you are doing here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I removed the argument zeroFn
.
When rounding by scales exceeding the max digits, the results of cudf::round
are inconsistent with Spark. And in terms of cuDF, it is a undefined behavior. So, technically cuDF can return any value under this situation. That's why we need to override the round of extreme large scales.
infFn: () => Scalar, | ||
negInfFn: () => Scalar): ColumnVector = { | ||
val scaleVal = scale.getValue.asInstanceOf[Int] | ||
val maxDigits = if (dataType == FloatType) 39 else 309 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to have a similar API to DecimalUtil.getPrecisionForIntegralType
, e.g. DecimalUtil.getPrecisionForFloatType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, the 39
and 309
are numeric bounds rather than precision of float types. These numeric bounds are rarely used in spark-rapids. And they have little relation with DecimalUtil
. So, I don't think we need to make it a common method.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>
build |
Signed-off-by: sperlingxx lovedreamf@gmail.com
Fixes #4273
Current PR is to fix up integral values rounded by a scale exceeding/reaching the max digits of data type. Under this circumstance, cuDF may produce different results to Spark. The general strategy applied in this PR: