-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rlike support #3796
Add Rlike support #3796
Conversation
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
val pattern = if (rhs.isValid) { | ||
rhs.getValue.asInstanceOf[UTF8String].toString | ||
} else { | ||
null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look like we have tested the null case. Does CUDF do the correct thing in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the null path can't be hit here. If I test with expr RLIKE null
then the rlike
expression itself is optimized out of the plan and replaced with a null literal. Also, we only attempt to run on GPU if the rhs
is a Literal. Are there any cases here where isValid
could be false? If not, perhaps I should remove this conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the conditional and throw an exception instead with an explanation about how this is never reached. Just to be defensive.
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
Closes #2.
This PR implements the
RLike
expression which works by calling cuDF'scontains_re
function.There are a number of known issues where results are not consistent with Apache Spark and these are documented in the compatibility guide.
The feature is disabled by default.
I have filed a follow-on issue #3797 for improving the support.