Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exception happened when getting a null row #3996

Closed
wbo4958 opened this issue Nov 2, 2021 · 0 comments
Closed

[BUG] Exception happened when getting a null row #3996

wbo4958 opened this issue Nov 2, 2021 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@wbo4958
Copy link
Collaborator

wbo4958 commented Nov 2, 2021

This issue is for #3942

Exception

  Cause: java.lang.AssertionError: value at 6 is null
  at ai.rapids.cudf.HostColumnVectorCore.assertsForGet(HostColumnVectorCore.java:228)
  at ai.rapids.cudf.HostColumnVectorCore.getInt(HostColumnVectorCore.java:254)
  at com.nvidia.spark.rapids.RapidsHostColumnVectorCore.getInt(RapidsHostColumnVectorCore.java:126)
  at org.apache.spark.sql.vectorized.ColumnarArray.getInt(ColumnarArray.java:128)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.MapObjects_0$(generated.java:57)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(generated.java:28)
  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:181)
  at com.nvidia.spark.rapids.shims.v2.GpuRowBasedScalaUDF.$anonfun$scalaConverter$2(GpuRowBasedScalaUDF.scala:70)
  at org.apache.spark.sql.rapids.GpuRowBasedScalaUDFBase.$anonfun$childAccessors$2(GpuScalaUDF.scala:141)

Root Cause,

Spark may have some bug on the CodeGen for MapObjects, please see https://github.com/apache/spark/blob/branch-3.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1095-L1097

The generated code is like below

  private scala.collection.mutable.WrappedArray MapObjects_0(InternalRow i) {
    boolean isNull_1 = i.isNullAt(0);
    ArrayData value_1 = isNull_1 ?
        null : (i.getArray(0));
    scala.collection.mutable.WrappedArray value_0 = null;

    if (!isNull_1) {

      int dataLength_0 = value_1.numElements();

      scala.collection.mutable.Builder collectionBuilder_0 =
          scala.collection.mutable.WrappedArray$.MODULE$.newBuilder();
      collectionBuilder_0.sizeHint(dataLength_0);


      int loopIndex_0 = 0;

      while (loopIndex_0 < dataLength_0) {
        value_MapObject_lambda_variable_1 = (int) (value_1.getInt(loopIndex_0));
        isNull_MapObject_lambda_variable_1 = value_1.isNullAt(loopIndex_0);

        boolean isNull_2 = true;
        int value_2 = -1;

        if (!isNull_MapObject_lambda_variable_1) {


          isNull_2 = false; // resultCode could change nullability.

          value_2 = value_MapObject_lambda_variable_1 + 1;


        }
        if (isNull_2) {
          collectionBuilder_0.$plus$eq(null);
        } else {
          collectionBuilder_0.$plus$eq(value_2);
        }

        loopIndex_0 += 1;
      }

      value_0 = (scala.collection.mutable.WrappedArray) scala.collection.mutable.WrappedArray$.MODULE$.make(((scala.collection.IndexedSeq)collectionBuilder_0.result()).toArray(scala.reflect.ClassTag$.MODULE$.Object()));
    }
    globalIsNull_0 = isNull_1;
    return value_0;
  }

We can see, it first call getInt then call isNullAt instead of isNullAt and getInt

        value_MapObject_lambda_variable_1 = (int) (value_1.getInt(loopIndex_0));
        isNull_MapObject_lambda_variable_1 = value_1.isNullAt(loopIndex_0);

I checked RapidsHostColumnVectorCore and found it didn't check isNullAt and call cudf.getInt, where throw exception when the row is null.

I also checked the getInt of OnHeapColumnVector

  private int[] intData;
  public int getInt(int rowId) {
    if (dictionary == null) {
      return intData[rowId];
    } else {
      return dictionary.decodeToInt(dictionaryIds.getDictId(rowId));
    }
  }

it will return the default value. Same as OffHeapColumnVector.

Obviously, this is caused by Spark code. But we shouldn't throw exception which cause the spark job failed.

@wbo4958 wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 2, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021
@wbo4958 wbo4958 closed this as completed Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants