Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Plugin should throw same arithmetic exceptions as Spark #5196

Closed
14 tasks done
andygrove opened this issue Apr 11, 2022 · 9 comments
Closed
14 tasks done

[FEA] Plugin should throw same arithmetic exceptions as Spark #5196

andygrove opened this issue Apr 11, 2022 · 9 comments
Assignees
Labels
good first issue Good for newcomers task Work required that improves the product but is not user facing

Comments

@andygrove
Copy link
Contributor

andygrove commented Apr 11, 2022

Is your feature request related to a problem? Please describe.
This is a follow-on to #5182 where we made some exception checks less specific due to changes in Spark 3.3 causing test failures.

We really should throw the same arithmetic exceptions as Spark and these vary by Spark version.

Describe the solution you'd like
Update arithmetic_ops_test.py and logic_test.py to check for specific exceptions java.lang.ArithmeticException vs org.apache.spark.SparkArithmeticException.

Describe alternatives you've considered
None

Additional context
This is a checklist of all the tests we need to update.

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 11, 2022
@sameerz sameerz added good first issue Good for newcomers task Work required that improves the product but is not user facing and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 12, 2022
@HaoYang670
Copy link
Collaborator

HaoYang670 commented Apr 26, 2022

Our plugin currently throw java.lang.ArithmeticException. Should we update it to throw SparkArithmeticException?

  def divByZeroError(): Nothing = {
    throw new ArithmeticException("divide by zero")
  }

  def divOverflowError(): Nothing = {
    throw new ArithmeticException("Overflow in integral divide.")
  }

https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/arithmetic.scala#L704-L710

@jlowe
Copy link
Member

jlowe commented Apr 26, 2022

Should we update it to throw SparkArithmeticException?

Yes. As noted in the issue description:

We really should throw the same arithmetic exceptions as Spark and these vary by Spark version.

There's already a RapidsErrorUtils class that is shimmed by Spark version to throw the proper exception type for various errors. I would expect these exceptions to be added to that class or maybe a similar shim'd class added to handle arithmetic exceptions.

@HaoYang670
Copy link
Collaborator

HaoYang670 commented Apr 27, 2022

Could we use QueryExecutionErrors in Spark instead of implementing our own?

@jlowe
Copy link
Member

jlowe commented Apr 27, 2022

Could we use QueryExecutionErrors in Spark instead of implementing our own?

We already do, see RapidsErrorUtils. We cannot use QueryExecutionErrors directly in common code because it wasn't added until Spark 3.2.0. That's why we use a shim'd class like RapidsErrorUtils to handle throwing the appropriate error, whether that's a standard Java exception or leveraging Spark's QueryExecutionErrors to throw a Spark-specific exception.

@HaoYang670

This comment was marked as resolved.

@revans2
Copy link
Collaborator

revans2 commented Jul 12, 2022

test_day_time_interval_division_overflow

Could you please get the exceptions that Spark is throwing that we want to match, including the stack trace?

From reading the code it looks like the error checking for Spark in the case of NaN and Inf is happening inside roundToInt

https://github.com/apache/spark/blob/3fde0ba6e67ca45e25e8a19e9bc8f8371f12cb71/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala#L519

It appears that it will throw a regular ArithmeticException not a SparkArithmeticException. So it would be nice to understand exactly what spark is doing.

@HaoYang670
Copy link
Collaborator

HaoYang670 commented Jul 13, 2022

Could you please get the exceptions that Spark is throwing that we want to match, including the stack trace?

Test on branch 22.08, using Spark 330

The difference occurs when we test these 2 cases :

    (FloatType(), [timedelta(seconds=0), 0.0]),   # 0 / 0 = NaN
    (DoubleType(), [timedelta(seconds=0), 0.0]),  # 0 / 0 = NaN

Spark will throw divide by zero exception and rapids throws has NaN exception.
Stack trace of Spark:

22/07/13 08:54:44 WARN TaskSetManager: Lost task 3.0 in stage 30.0 (TID 123) (remzi-desktop executor driver): org.apache.spark.SparkArithmeticException: Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" (except for ANSI interval type) to bypass this error.
== SQL(line 1, position 1) ==
a / b
^^^^^

        at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:184)
        at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Stack trace of spark-rapids:

java.lang.ArithmeticException: Has NaN
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.$anonfun$checkDoubleInfNan$5(intervalExpressions.scala:119)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.$anonfun$checkDoubleInfNan$5$adapted(intervalExpressions.scala:117)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.withResource(intervalExpressions.scala:27)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.checkDoubleInfNan(intervalExpressions.scala:117)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.roundDoubleToLongWithOverflowCheck(intervalExpressions.scala:148)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.$anonfun$doColumnar$8(intervalExpressions.scala:555)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResource(intervalExpressions.scala:512)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.doColumnar(intervalExpressions.scala:553)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.doColumnar(intervalExpressions.scala:526)
        at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:256)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResourceIfAllowed(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:253)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResourceIfAllowed(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval(GpuExpressions.scala:252)
        at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval$(GpuExpressions.scala:251)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.columnarEval(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
        at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:109)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
        at com.nvidia.spark.rapids.GpuExpressionsUtils$.columnarEvalToColumn(GpuExpressions.scala:93)
        at com.nvidia.spark.rapids.GpuProjectExec$.projectSingle(basicPhysicalOperators.scala:102)
        at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:109)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:216)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:213)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:213)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:248)
        at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:109)
        at com.nvidia.spark.rapids.GpuProjectExec$.projectAndClose(basicPhysicalOperators.scala:73)
        at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$doExecuteColumnar$1(basicPhysicalOperators.scala:149)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:241)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@HaoYang670
Copy link
Collaborator

For these 2 test cases:

    (FloatType(), [timedelta(seconds=1), float('NaN')]),
    (DoubleType(), [timedelta(seconds=1), float('NaN')]),

Both Spark and rapids throws has nan exception.
Stack trace of Spark

22/07/13 09:08:22 WARN TaskSetManager: Lost task 3.0 in stage 2.0 (TID 11) (remzi-desktop executor driver): java.lang.ArithmeticException: input is infinite or NaN
        at org.sparkproject.guava.math.DoubleMath.roundIntermediate(DoubleMath.java:54)
        at org.sparkproject.guava.math.DoubleMath.roundToLong(DoubleMath.java:149)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Stack trace of rapids:

22/07/13 09:08:22 ERROR Executor: Exception in task 3.0 in stage 3.0 (TID 15)
java.lang.ArithmeticException: Has NaN
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.$anonfun$checkDoubleInfNan$5(intervalExpressions.scala:119)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.$anonfun$checkDoubleInfNan$5$adapted(intervalExpressions.scala:117)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.withResource(intervalExpressions.scala:27)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.checkDoubleInfNan(intervalExpressions.scala:117)
        at org.apache.spark.sql.rapids.shims.IntervalUtils$.roundDoubleToLongWithOverflowCheck(intervalExpressions.scala:148)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.$anonfun$doColumnar$8(intervalExpressions.scala:555)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResource(intervalExpressions.scala:512)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.doColumnar(intervalExpressions.scala:553)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.doColumnar(intervalExpressions.scala:526)
        at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:256)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResourceIfAllowed(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:253)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
        at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.withResourceIfAllowed(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval(GpuExpressions.scala:252)
        at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval$(GpuExpressions.scala:251)
        at org.apache.spark.sql.rapids.shims.GpuDivideDTInterval.columnarEval(intervalExpressions.scala:512)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
        at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:109)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
        at com.nvidia.spark.rapids.GpuExpressionsUtils$.columnarEvalToColumn(GpuExpressions.scala:93)
        at com.nvidia.spark.rapids.GpuProjectExec$.projectSingle(basicPhysicalOperators.scala:102)
        at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:109)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:216)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:213)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:213)
        at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:248)
        at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:109)
        at com.nvidia.spark.rapids.GpuProjectExec$.projectAndClose(basicPhysicalOperators.scala:73)
        at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$doExecuteColumnar$1(basicPhysicalOperators.scala:149)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:241)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

@HaoYang670
Copy link
Collaborator

HaoYang670 commented Aug 1, 2022

Close this issue as all subtasks has been done. Feel free to reopen it if anyone some other opinions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers task Work required that improves the product but is not user facing
Projects
None yet
Development

No branches or pull requests

5 participants