Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support creating list ColumnVector for Literal(ArrayType(NullType)) #2448

Merged
merged 2 commits into from
May 19, 2021

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented May 19, 2021

This PR is bringing back the feature to support creating list ColumnVector for
ArrayType(NullType).

The previous implementation was a workaround to generate GpuCreateArray
from LiteralExprMeta for ArrayType(NullType) and then create the corresponding
list ColumnVector in GpuCreateArray.

This PR just implement the logic based on the feature GpuLiteral for array
instead of GpuCreateArray.

Signed-off-by: Bobby Wang wbo4958@gmail.com

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
@wbo4958 wbo4958 requested a review from firestarman May 19, 2021 09:40
@wbo4958
Copy link
Collaborator Author

wbo4958 commented May 19, 2021

build

@wbo4958 wbo4958 added the bug Something isn't working label May 19, 2021
@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging {
val colType = resolveElementType(elementType)
val rows = seq.map(convertElementTo(_, elementType))
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*)
case NullType => // Byte is used for NullType
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@firestarman firestarman May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here a ColumnVector filled with nulls is expected ? If yes, you can try
GpuColumnVector.columnVectorFromNull(seq.size, NullType), then you do not need the comment here. And suppose the later one is faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cudf does not support a null type so we just use a byte because it is the smallest available. and @firestarman is correct please use GpuColumnVector.columnVectorFromNull(seq.length, NullType)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@revans2 Seems seq.size equals seq.length

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is super minor. Scala prefers length over size. Java has length for arrays and size for everything else. In Scala both work but idiomatic Scala prefers length, so it is consistent everywhere.

@wbo4958 wbo4958 changed the title support creating list for NullType support creating list ColumnVector for ArrayType(NullType) May 19, 2021
@wbo4958 wbo4958 changed the title support creating list ColumnVector for ArrayType(NullType) support creating list ColumnVector for Literal(ArrayType(NullType)) May 19, 2021
@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging {
val colType = resolveElementType(elementType)
val rows = seq.map(convertElementTo(_, elementType))
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*)
case NullType => // Byte is used for NullType
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*)
Copy link
Collaborator

@firestarman firestarman May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here a ColumnVector filled with nulls is expected ? If yes, you can try
GpuColumnVector.columnVectorFromNull(seq.size, NullType), then you do not need the comment here. And suppose the later one is faster.

@wbo4958 wbo4958 added bug Something isn't working and removed bug Something isn't working labels May 19, 2021
revans2
revans2 previously approved these changes May 19, 2021
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me but there are some nits that would be good to clean up.

@@ -176,6 +176,8 @@ object GpuScalar extends Arm with Logging {
val colType = resolveElementType(elementType)
val rows = seq.map(convertElementTo(_, elementType))
ColumnVector.fromStructs(colType, rows.asInstanceOf[Seq[HostColumnVector.StructData]]: _*)
case NullType => // Byte is used for NullType
ColumnVector.fromBoxedBytes(seq.asInstanceOf[Seq[JByte]]: _*)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cudf does not support a null type so we just use a byte because it is the smallest available. and @firestarman is correct please use GpuColumnVector.columnVectorFromNull(seq.length, NullType)

@wbo4958
Copy link
Collaborator Author

wbo4958 commented May 19, 2021

build

@wbo4958 wbo4958 merged commit 51c0dc2 into NVIDIA:branch-21.06 May 19, 2021
@wbo4958 wbo4958 deleted the list_null branch May 19, 2021 13:43
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants