-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patterns such (3?)+ should now fall back to CPU #4715
Patterns such (3?)+ should now fall back to CPU #4715
Conversation
…rted by cuDF and should fallback to CPU
Signed-off-by: Navin Kumar <navink@nvidia.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala
Outdated
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/RegularExpressionTranspilerSuite.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Navin Kumar <navink@nvidia.com>
tests/src/test/scala/com/nvidia/spark/rapids/RegularExpressionTranspilerSuite.scala
Outdated
Show resolved
Hide resolved
… in order to avoid hanging on the GPU Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RegexParser.scala
Outdated
Show resolved
Hide resolved
build |
Signed-off-by: Navin Kumar <navink@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style nits that aren't must-fix, lgtm.
|
||
private def isSupportedRepetitionBase(e: RegexAST): Boolean = { | ||
e match { | ||
case RegexEscaped(ch) if ch != 'd' && ch != 'w' => // example: "\B?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Comments should be consistent
case RegexEscaped(ch) if ch != 'd' && ch != 'w' => // example: "\B?" | |
case RegexEscaped(ch) if ch != 'd' && ch != 'w' => | |
// example: "\B?" |
case (RegexChar(a), _) if "$^".contains(a) => | ||
// example: "$*" | ||
throw new RegexUnsupportedException(nothingToRepeat) | ||
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0,Some(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0,Some(0))) | |
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0, Some(0))) |
throw new RegexUnsupportedException(nothingToRepeat) | ||
|
||
case _ => | ||
case (RegexGroup(_, term), QuantifierVariableLength(_,None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case (RegexGroup(_, term), QuantifierVariableLength(_,None)) | |
case (RegexGroup(_, term), QuantifierVariableLength(_, None)) |
Partial fix for #4487.
Previously patterns like
(3?)+
or(\A)+
would hang inlibcudf
. In some cases, that has now been corrected upstream and will now thrown an exception as an unsupported pattern. This fix will intercept these patterns in the transpiler to throw the appropriateRegexUnsupportedException
Fuzz testing currently brings up more possible edge cases that break in the plugin, so this is only considered a partial fix at moment. Further testing can be performed to finally handle the remaining edge cases.