You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seq("""{"a":[]}""").toDF("json").repartition(1).selectExpr("from_json(json, 'a array<string>')").show()
results in an error like.
Caused by: java.lang.AssertionError: Type conversion is not allowed from STRUCT(LIST(INT8)) to StructType(StructField(a,ArrayType(StringType,true),true)) expected STRUCT(LIST(STRING))
at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:711)
at com.nvidia.spark.rapids.GpuUnaryExpression.$anonfun$doItColumnar$1(GpuExpressions.scala:254)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuUnaryExpression.doItColumnar(GpuExpressions.scala:250)
at com.nvidia.spark.rapids.GpuUnaryExpression.$anonfun$columnarEval$1(GpuExpressions.scala:261)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuUnaryExpression.columnarEval(GpuExpressions.scala:260)
at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:35)
If assertions are enabled.
Similarly
Seq("""{"a":1,"b":"","c":[]}""").toDF("json").repartition(1).selectExpr("from_json(json, 'a int, b string, c array<string>')").show()
throws
Caused by: java.lang.AssertionError: Type conversion is not allowed from STRUCT(INT32,STRING,LIST(INT8)) to StructType(StructField(a,IntegerType,true),StructField(b,StringType,true),StructField(c,ArrayType(StringType,true),true)) expected STRUCT(INT32,STRING,LIST(STRING))
at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:711)
at com.nvidia.spark.rapids.GpuUnaryExpression.$anonfun$doItColumnar$1(GpuExpressions.scala:254)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuUnaryExpression.doItColumnar(GpuExpressions.scala:250)
at com.nvidia.spark.rapids.GpuUnaryExpression.$anonfun$columnarEval$1(GpuExpressions.scala:261)
It looks like CUDF ignores our request that the returned value be a LIST(STRING) and returns a LIST(INT8) instead. This feels like a bug in CUDF, but we can probably work around it if we need to. But it is not going to be super simple.
The text was updated successfully, but these errors were encountered:
I should add that an empty struct results in a different error.
Seq("""{"a":1,"b":"","c":{}}""").toDF("json").repartition(1).selectExpr("from_json(json, 'a int, b string, c struct<a string>')").show()
Caused by: java.lang.NullPointerException
at ai.rapids.cudf.Table.gatherJSONColumns(Table.java:1105)
at ai.rapids.cudf.Table.gatherJSONColumns(Table.java:1225)
at ai.rapids.cudf.Table.readJSON(Table.java:1391)
at org.apache.spark.sql.rapids.GpuJsonToStructs.$anonfun$doColumnar$2(GpuJsonToStructs.scala:180)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at org.apache.spark.sql.rapids.GpuJsonToStructs.$anonfun$doColumnar$1(GpuJsonToStructs.scala:178)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at org.apache.spark.sql.rapids.GpuJsonToStructs.doColumnar(GpuJsonToStructs.scala:176)
This looks almost identical to reading an list with only empty top level structs.
Describe the bug
This is with #10575
results in an error like.
If assertions are enabled.
Similarly
throws
It looks like CUDF ignores our request that the returned value be a
LIST(STRING)
and returns aLIST(INT8)
instead. This feels like a bug in CUDF, but we can probably work around it if we need to. But it is not going to be super simple.The text was updated successfully, but these errors were encountered: