You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In Spark 3.0.x or in Spark 3.1.+ with spark.sql.legacy.castComplexTypesToString.enabled=true queries from an RDD source may crash with
java.lang.AssertionError: value at 15 is null
at ai.rapids.cudf.HostColumnVectorCore.assertsForGet(HostColumnVectorCore.java:228)
at ai.rapids.cudf.HostColumnVectorCore.getUTF8(HostColumnVectorCore.java:355)
at com.nvidia.spark.rapids.RapidsHostColumnVectorCore.getUTF8String(RapidsHostColumnVectorCore.java:177)
at org.apache.spark.sql.vectorized.ColumnarBatchRow.getUTF8String(ColumnarBatch.java:220)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:346)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
21/04/30 17:37:16 WARN GpuOverrides:
*Exec <ProjectExec> will run on GPU
*Expression <Alias> cast(a#0 as string) AS a#34 will run on GPU
*Expression <Cast> cast(a#0 as string) will run on GPU
*Exec <FilterExec> will run on GPU
*Expression <And> (isnotnull(b#1) AND (b#1 > 1)) will run on GPU
*Expression <IsNotNull> isnotnull(b#1) will run on GPU
*Expression <GreaterThan> (b#1 > 1) will run on GPU
!NOT_FOUND <RDDScanExec> cannot run on GPU because no GPU enabled version of operator class org.apache.spark.sql.execution.RDDScanExec could be found
@Expression <AttributeReference> a#0 could run on GPU
@Expression <AttributeReference> b#1 could run on GPU
Expected behavior
Cast should work the same as on CPU
Environment details (please complete the following information)
local REPL is sufficient to reproduce
Additional context
bug found while working on #2274 .
Interestingly saving the synthetic df to parquet and reading it back yields the correct result without a crash with the FileSourceScanExec plan:
*Exec <ProjectExec> will run on GPU
*Expression <Alias> cast(a#38 as string) AS a#42 will run on GPU
*Expression <Cast> cast(a#38 as string) will run on GPU
*Exec <FilterExec> will run on GPU
*Expression <And> (isnotnull(b#39) AND (b#39 > 1)) will run on GPU
*Expression <IsNotNull> isnotnull(b#39) will run on GPU
*Expression <GreaterThan> (b#39 > 1) will run on GPU
*Exec <FileSourceScanExec> will run on GPU
The text was updated successfully, but these errors were encountered:
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. Fixes#2309 and #2315
Signed-off-by: Gera Shegalov gera@apache.org
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this issue
Jun 9, 2021
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. FixesNVIDIA#2309 and NVIDIA#2315
Signed-off-by: Gera Shegalov gera@apache.org
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this issue
Jun 9, 2021
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. FixesNVIDIA#2309 and NVIDIA#2315
Signed-off-by: Gera Shegalov gera@apache.org
Describe the bug
In Spark 3.0.x or in Spark 3.1.+ with
spark.sql.legacy.castComplexTypesToString.enabled=true
queries from an RDD source may crash withSteps/Code to reproduce bug
Minimum repro:
GPU plan exhibiting the crash:
Expected behavior
Cast should work the same as on CPU
Environment details (please complete the following information)
local REPL is sufficient to reproduce
Additional context
bug found while working on #2274 .
Interestingly saving the synthetic df to parquet and reading it back yields the correct result without a crash with the FileSourceScanExec plan:
The text was updated successfully, but these errors were encountered: