[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException #3999

tgravescs · 2021-11-02T15:10:32Z

Describe the bug
The integration cache_test is failing:

[2021-11-02T14:30:15.572Z] FAILED ../../src/main/python/cache_test.py::test_cache_join[{'spark.sql.inMemoryColumnarStorage.enableVectorizedReader': 'true'}-Left-String][IGNORE_ORDER]
.... lots of others as well

2021-11-02T14:01:19.239Z] 21/11/02 14:01:19 WARN TaskSetManager: Lost task 4.0 in stage 211.0 (TID 1138) (10.233.92.210 executor 0): java.lang.ArrayIndexOutOfBoundsException: 0
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.$anonfun$addBatchToConcat$1(HostColumnarToGpu.scala:330)
[2021-11-02T14:01:19.239Z]  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.addBatchToConcat(HostColumnarToGpu.scala:329)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.addBatch(GpuCoalesceBatches.scala:408)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$next$1(GpuCoalesceBatches.scala:336)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:202)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:322)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:202)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.GpuHashAggregateIterator.aggregateInputBatches(aggregate.scala:282)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.GpuHashAggregateIterator.$anonfun$next$2(aggregate.scala:237)
[2021-11-02T14:01:19.239Z]  at scala.Option.getOrElse(Option.scala:189)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:234)
[2021-11-02T14:01:19.239Z]  at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:180)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:291)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:307)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.scheduler.Task.run(Task.scala:131)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
[2021-11-02T14:01:19.239Z]  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
[2021-11-02T14:01:19.239Z]  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-11-02T14:01:19.239Z]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

The text was updated successfully, but these errors were encountered:

abellina · 2021-11-02T15:58:55Z

I am taking a look

abellina · 2021-11-02T16:14:03Z

Given the log it seems this is isolated to Spark 3.2.0

abellina · 2021-11-02T16:23:55Z

Ok this seems to be an issue with com.nvidia.spark.ParquetCachedBatchSerializer, specifically in Spark 3.2.0. The failures reported were specific to Spark 3.2.0, and I also verified this locally. To reproduce, build the plugin with -Dbuildver=320, run the integration suite restricting the cache suite to something like:

./run_pyspark_from_build.sh -k test_cache_join\ and\ Left-Boolean

This will run two tests, one with spark.sql.inMemoryColumnarStorage.enableVectorizedReader=true and one with spark.sql.inMemoryColumnarStorage.enableVectorizedReader=false. Both tests pass if I don't set:

--conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer

But if I provide the custom serializer, the spark.sql.inMemoryColumnarStorage.enableVectorizedReader=true test fails (the one where the vectorized reader is disabled passes)

abellina · 2021-11-02T17:51:17Z

It started happening with this commit: c6b2479. If I go to the commit before that: 83706a5, it works.

@gerashegalov could take a look at this failure? Something in your PR introduced a change that is breaking the test.

tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 2, 2021

abellina self-assigned this Nov 2, 2021

abellina removed their assignment Nov 2, 2021

abellina added the P0 Must have for release label Nov 2, 2021

Salonijain27 assigned gerashegalov Nov 2, 2021

Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021

gerashegalov added this to the Nov 1 - Nov 12 milestone Nov 3, 2021

gerashegalov mentioned this issue Nov 3, 2021

Explicitly use the public version of ParquetCachedBatchSerializer #4021

Merged

gerashegalov closed this as completed in #4021 Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException #3999

[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException #3999

tgravescs commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021

[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException #3999

[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException #3999

Comments

tgravescs commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021

abellina commented Nov 2, 2021