-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix GpuCreateNamedStruct not serializable issue #2442
Conversation
build |
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/complexTypeCreator.scala
Outdated
Show resolved
Hide resolved
build |
@revans2 Could you help to review this PR |
case g: GpuExpression => g.columnarEval(null) | ||
case gl: GpuLiteral => gl.value | ||
case ge: GpuExpression => | ||
throw new IllegalStateException(s"Unexpected GPU expression $ge") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to make it so the Literal is wrapped in an Alias, which should totally be valid, but I have not been able to make it happen with the public spark APIs. If we are going to throw an exception on something we don't expect, then we should be absolutely positive that we got what we expected, and all of the default literal checks being use allow for an alias. So either still do the eval and then pull the name out of the GpuScalar
that is returned. Or add in a second rule for an Alias of a literal, just to be on the safe side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still do the eval and then pull the name out of the GpuScalar that is returned.
We can't do this because this is happening on the driver side, correct? That's one of the drawbacks of the recent GpuScalar
refactor -- it's practically impossible to call columnarEval
without requiring a GPU to do it, so it's effectively off-limits to the driver. That would be a simpler way to handle any Alias or similar shenanigans, but since we can't call columnarEval
or eval
to solve it, we'll have to pull it apart manually instead IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jason, and the new commit has added GpuAlias case. BTW, I see the name check has been limited to literal, Do we really add the GpuAlias case?
object CreateNamedStructCheck extends ExprChecks {
val nameSig: TypeSig = TypeSig.lit(TypeEnum.STRING)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's practically impossible to call columnarEval without requiring a GPU to do it, so it's effectively off-limits to the driver.
If a GpuScalar
is created from a Scala value (GpuLiteral
will create such a GpuScalar
in its columnarEval
) and only the getValue
will be called to get the internal Scala value, then it is safe to use on the driver side without GPU. Becasue the GpuScalar for this case is totally a wrapper of a Scala value, nothing to do with GPU.
But I am a little confused. Could we call the columnarEval
on driver side with an invalid input for a generic GPU expression before the GpuScalar refactor? I think it is still not always safe because the processing inside the columnarEval
may require a GPU. It may be safe if we know the instance of the GPU expression. And if we know the instance of a GPU expression, I think we can use its direct methods or members instead.
@@ -82,7 +82,9 @@ case class GpuCreateNamedStruct(children: Seq[Expression]) extends GpuExpression | |||
}.toList.unzip | |||
|
|||
private lazy val names = nameExprs.map { | |||
case g: GpuExpression => g.columnarEval(null) | |||
case gl: GpuLiteral => gl.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this change is not clear. Could we get a comment to explain why this is happening so no one accidentally changes it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment before, but marking as needs changes because ti was already approved.
build |
The return from columnarEval of GpuLiteral has been changed to GpuScalar which will be as the type of names(a private field in GpuCreateNamedStruct). And the names will be serialized before scheduling, but the GpuScalar is not serializable, it will cause "org.apache.spark.SparkException: Task not serializable" Signed-off-by: Bobby Wang <wbo4958@gmail.com>
2e37d0d
to
bfe48e4
Compare
since there are some conflicts, I rebased the PR |
build |
private lazy val names = nameExprs.map { | ||
case gl: GpuLiteral => gl.value | ||
case GpuAlias(gl: GpuLiteral, _) => gl.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically Alias can nest arbitrarily, so I think we'd want to match on the result of calling something like what GpuOverrides.extractLit
does but for GpuLiteral
or what Spark's AliasHelper.trimAliases
does, probably handling GpuAlias
since it's already been translated at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Good suggestion, from the safe side, I added the extractGpulit for this case.
Thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
build |
LGTM |
private lazy val names = nameExprs.map { | ||
case g: GpuExpression => g.columnarEval(null) | ||
case ge: GpuExpression => | ||
GpuExpressionsUtils.extractGpuLit(ge) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
GpuExpressionsUtils.extractGpuLit(ge).map(_.value)
.getOrElse(throw new IllegalStateException(s"Unexpected GPU expression $ge"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx Gera
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but I would take it a step further because extractGpuLit
takes an Expression
as input not a GpuExpression
and we should always get GpuExpression
s in nameExprs
or we have other problems.
private lazy val names = nameExprs.map { exp =>
GpuExpressionsUtils.extractGpuLit(exp)
.getOrElse(throw new IllegalStateException(s"Unexpected Expression $exp"))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, But if extractGpuLit takes a GpuExpression, then it's hard for GpuAlias whose child is Expression to do the recursive calling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Seems GpuAlias can wrap like AttributeReference which can't be cast to GpuExpression and it will make the extractGpuLit
really complicated if it takes GpuExpressiion @revans2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you understood. I don't want extractGpuLit
to take a GpuExpression
. It is good like it is. I just don't think we need to check for the non-GpuExpression case when computing names
. It should never happen so it is OK to throw an exception if we see it., Which is why I said we could simplify the processing to just be.
private lazy val names = nameExprs.map { exp =>
GpuExpressionsUtils.extractGpuLit(exp)
.getOrElse(throw new IllegalStateException(s"Unexpected Expression $exp"))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Yeah, You're right. Sorry for the misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks fine, and I would be OK with checking it in as is, but I would like to see it cleaned up a bit.
private lazy val names = nameExprs.map { | ||
case g: GpuExpression => g.columnarEval(null) | ||
case ge: GpuExpression => | ||
GpuExpressionsUtils.extractGpuLit(ge) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but I would take it a step further because extractGpuLit
takes an Expression
as input not a GpuExpression
and we should always get GpuExpression
s in nameExprs
or we have other problems.
private lazy val names = nameExprs.map { exp =>
GpuExpressionsUtils.extractGpuLit(exp)
.getOrElse(throw new IllegalStateException(s"Unexpected Expression $exp"))
}
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly what I wanted, but good enough.
* fix GpuCreateNamedStruct not serializable issue The return from columnarEval of GpuLiteral has been changed to GpuScalar which will be as the type of names(a private field in GpuCreateNamedStruct). And the names will be serialized before scheduling, but the GpuScalar is not serializable, it will cause "org.apache.spark.SparkException: Task not serializable" Signed-off-by: Bobby Wang <wbo4958@gmail.com> * resolve comments * resolve comments * resolve comments * remove unused import * resolve comments
* fix GpuCreateNamedStruct not serializable issue The return from columnarEval of GpuLiteral has been changed to GpuScalar which will be as the type of names(a private field in GpuCreateNamedStruct). And the names will be serialized before scheduling, but the GpuScalar is not serializable, it will cause "org.apache.spark.SparkException: Task not serializable" Signed-off-by: Bobby Wang <wbo4958@gmail.com> * resolve comments * resolve comments * resolve comments * remove unused import * resolve comments
* fix GpuCreateNamedStruct not serializable issue The return from columnarEval of GpuLiteral has been changed to GpuScalar which will be as the type of names(a private field in GpuCreateNamedStruct). And the names will be serialized before scheduling, but the GpuScalar is not serializable, it will cause "org.apache.spark.SparkException: Task not serializable" Signed-off-by: Bobby Wang <wbo4958@gmail.com> * resolve comments * resolve comments * resolve comments * remove unused import * resolve comments
The return from columnarEval of GpuLiteral has been changed to GpuScalar which
will be as the type of names(a private field in GpuCreateNamedStruct). And the
names will be serialized before spark scheduling, but the GpuScalar is not serializable,
it will cause "org.apache.spark.SparkException: Task not serializable".
On the other hand, the calling
columnarEval(null)
in the driver side is really dangerous,This PR also pull it apart manually
Signed-off-by: Bobby Wang wbo4958@gmail.com
Closes #2427
Closes #2443