-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support creating list ColumnVector for Literal(ArrayType(NullType)) #2448
Conversation
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
build |
@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging { | |||
val colType = resolveElementType(elementType) | |||
val rows = seq.map(convertElementTo(_, elementType)) | |||
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*) | |||
case NullType => // Byte is used for NullType | |||
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are using Byte for NullType, pls refer to https://github.com/NVIDIA/spark-rapids/blob/branch-21.06/sql-plugin/src/main/java/com/nvidia/spark/rapids/GpuColumnVector.java#L456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here a ColumnVector filled with nulls is expected ? If yes, you can try
GpuColumnVector.columnVectorFromNull(seq.size, NullType)
, then you do not need the comment here. And suppose the later one is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cudf does not support a null type so we just use a byte because it is the smallest available. and @firestarman is correct please use GpuColumnVector.columnVectorFromNull(seq.length, NullType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 Seems seq.size equals seq.length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is super minor. Scala prefers length
over size
. Java has length
for arrays and size
for everything else. In Scala both work but idiomatic Scala prefers length, so it is consistent everywhere.
@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging { | |||
val colType = resolveElementType(elementType) | |||
val rows = seq.map(convertElementTo(_, elementType)) | |||
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*) | |||
case NullType => // Byte is used for NullType | |||
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here a ColumnVector filled with nulls is expected ? If yes, you can try
GpuColumnVector.columnVectorFromNull(seq.size, NullType)
, then you do not need the comment here. And suppose the later one is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me but there are some nits that would be good to clean up.
@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging { | |||
val colType = resolveElementType(elementType) | |||
val rows = seq.map(convertElementTo(_, elementType)) | |||
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*) | |||
case NullType => // Byte is used for NullType | |||
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cudf does not support a null type so we just use a byte because it is the smallest available. and @firestarman is correct please use GpuColumnVector.columnVectorFromNull(seq.length, NullType)
build |
…VIDIA#2448) Signed-off-by: Bobby Wang <wbo4958@gmail.com>
…VIDIA#2448) Signed-off-by: Bobby Wang <wbo4958@gmail.com>
This PR is bringing back the feature to support creating list ColumnVector for
ArrayType(NullType).
The previous implementation was a workaround to generate GpuCreateArray
from LiteralExprMeta for ArrayType(NullType) and then create the corresponding
list ColumnVector in GpuCreateArray.
This PR just implement the logic based on the feature GpuLiteral for array
instead of GpuCreateArray.
Signed-off-by: Bobby Wang wbo4958@gmail.com