Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge case where one side of regexp choice ends in duplicate string anchors #5664

Conversation

anthony-chang
Copy link
Contributor

Fixes #5654

cuDF does not support choice where either side ends with a string anchor. However, when we had a sequence ending in something like $$, it would get compiled as RegexChar('$'), RegexEmpty() so checking only the last part of the sequence is not enough.

Signed-off-by: Anthony Chang antchang@nvidia.com

Signed-off-by: Anthony Chang <antchang@nvidia.com>
…x-5654-double-anchor-around-choice

Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang anthony-chang changed the title Fix 5654 double anchor around choice Fix edge case where one side of regexp choice ends in duplicate string anchors May 26, 2022
@anthony-chang anthony-chang requested a review from andygrove May 26, 2022 22:42
@anthony-chang anthony-chang self-assigned this May 26, 2022
@anthony-chang
Copy link
Contributor Author

build

@@ -1169,13 +1169,18 @@ class CudfRegexTranspiler(mode: RegexMode) {
def endsWithLineAnchor(e: RegexAST): Boolean = {
e match {
case RegexSequence(parts) if parts.nonEmpty =>
endsWithLineAnchor(parts.last)
val j = parts.lastIndexWhere {
case RegexEmpty() => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine for this specific instance but do you think it is worth introducing a new step where we rewrite the AST tree and discard any RegexEmpty instances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would we have this step though? rewrite will be where all the RegexEmptys are introduced, and discarding them after rewrite won't be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Ok, let's see if there are other cases that come up where we need to add explicit handling of RegexEmpty and if so then we may want to at least introduce some utility methods.

@sameerz sameerz added the bug Something isn't working label May 27, 2022
@sameerz sameerz added this to the May 23 - Jun 3 milestone May 27, 2022
@anthony-chang anthony-chang merged commit 5a80885 into NVIDIA:branch-22.06 May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Transpiler produces regex pattern that cuDF cannot compile
3 participants