Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

sperlingxx · 2021-12-22T10:24:34Z

Signed-off-by: sperlingxx lovedreamf@gmail.com

Current PR is to fix up integral values rounded by a scale exceeding/reaching the max digits of data type. Under this circumstance, cuDF may produce different results to Spark. The general strategy applied in this PR:

For scales exceeding max digits, we can simply return zero values.
For scales equaling to max digits, we need to perform round. Fortunately, round up will NOT occur on the max digits of numeric types except LongType. Therefore, we only need to handle round down for most of types, through returning zero values. For LongType rounded by scale -19, we handle it in a specialized way.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

…k-rapids into fix_round_cornor

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-12-22T10:25:49Z

build

firestarman · 2021-12-23T05:24:32Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

+    // For scales equaling to max digits, we need to perform round. Fortunately, round up
+    // will NOT occur on the max digits of numeric types except LongType. Therefore, we only
+    // need to handle round down for most of types, through returning zero values.
+    def fixUpOverflowInts(zeroFn: () => Scalar): ColumnVector = {


I would prefer defining these functions out of this doColumnar to making them as the internal functions.

firestarman · 2021-12-23T05:30:31Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

+    // Since LongType can not hold these two values, the 1e19 overflows as -8446744073709551616L,
+    // and the -1e19 overflows as 8446744073709551616L. The overflow happens in the same way for
+    // HALF_UP (round) and HALF_EVEN (bround).
+    def fixUpInt64OnBounds(zeroFn: () => Scalar): ColumnVector = {


Here only handles the long type, so seems this zeroFn is not necessary.

It should always be a zero column of the long type, right ?

BTW, do we really need to take care of this round up/down of long ? Can cudf round handle it the same as what you are doing here ?

You are right. I removed the argument zeroFn.
When rounding by scales exceeding the max digits, the results of cudf::round are inconsistent with Spark. And in terms of cuDF, it is a undefined behavior. So, technically cuDF can return any value under this situation. That's why we need to override the round of extreme large scales.

firestarman · 2021-12-23T05:44:11Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

+        infFn: () => Scalar,
+        negInfFn: () => Scalar): ColumnVector = {
+      val scaleVal = scale.getValue.asInstanceOf[Int]
+      val maxDigits = if (dataType == FloatType) 39 else 309


I prefer to have a similar API to DecimalUtil.getPrecisionForIntegralType, e.g. DecimalUtil.getPrecisionForFloatType

Strictly speaking, the 39 and 309 are numeric bounds rather than precision of float types. These numeric bounds are rarely used in spark-rapids. And they have little relation with DecimalUtil. So, I don't think we need to make it a common method.

integration_tests/src/main/python/arithmetic_ops_test.py

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala

Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2022-01-10T06:49:11Z

build

sperlingxx added 6 commits December 17, 2021 15:49

fix cornor cases of non-decimal round

379bb29

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

update

4f01eba

update

371cf2a

Merge branch 'fix_round_cornor' of https://github.com/sperlingxx/spar…

e59ba49

…k-rapids into fix_round_cornor

cache

d1822a1

fix up some cornor cases of round

3e9822e

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested review from revans2 and firestarman December 22, 2021 10:24

sperlingxx changed the title ~~Fix up incorrect results of rounding past the max digits of data type~~ Fix up incorrect results of rounding past the max digits of data type [databricks] Dec 22, 2021

firestarman reviewed Dec 23, 2021

View reviewed changes

integration_tests/src/main/python/arithmetic_ops_test.py Outdated Show resolved Hide resolved

integration_tests/src/main/python/arithmetic_ops_test.py Outdated Show resolved Hide resolved

sameerz added the bug Something isn't working label Dec 29, 2021

revans2 reviewed Jan 7, 2022

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/mathExpressions.scala Outdated Show resolved Hide resolved

sperlingxx and others added 3 commits January 10, 2022 13:40

Merge remote-tracking branch 'origin/branch-22.02' into fix_round_cornor

107b318

Apply suggestions from code review

c8babb5

Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>

refine

b285be9

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested review from firestarman and revans2 January 10, 2022 06:56

revans2 approved these changes Jan 11, 2022

View reviewed changes

sperlingxx merged commit 460461e into NVIDIA:branch-22.02 Jan 12, 2022

sperlingxx deleted the fix_round_cornor branch January 12, 2022 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

sperlingxx commented Dec 22, 2021

sperlingxx commented Dec 22, 2021

firestarman Dec 23, 2021

sperlingxx Jan 10, 2022

firestarman Dec 23, 2021

sperlingxx Jan 10, 2022

firestarman Dec 23, 2021

sperlingxx Jan 10, 2022

sperlingxx commented Jan 10, 2022

Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

Fix up incorrect results of rounding past the max digits of data type [databricks] #4420

Conversation

sperlingxx commented Dec 22, 2021

sperlingxx commented Dec 22, 2021

firestarman Dec 23, 2021

Choose a reason for hiding this comment

sperlingxx Jan 10, 2022

Choose a reason for hiding this comment

firestarman Dec 23, 2021

Choose a reason for hiding this comment

sperlingxx Jan 10, 2022

Choose a reason for hiding this comment

firestarman Dec 23, 2021

Choose a reason for hiding this comment

sperlingxx Jan 10, 2022

Choose a reason for hiding this comment

sperlingxx commented Jan 10, 2022