Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp: Transpile \D, \W to Java's definitions #5575

Merged
merged 16 commits into from
May 27, 2022

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented May 20, 2022

Closes #5547

Adds support for \D and \W by transpiling to [^0-9] and [^a-zA-Z_0-9] to match Java's definition.

@andygrove andygrove force-pushed the regexp-digit-word-upper branch from 628e81c to 2a28d89 Compare May 20, 2022 18:38
@andygrove andygrove changed the title WIP: Regexp: Add support for \D and \W Regexp: Transpile \D, \W to Java's definitions May 20, 2022
@andygrove andygrove changed the title Regexp: Transpile \D, \W to Java's definitions WIP: Regexp: Transpile \D, \W to Java's definitions May 20, 2022
@andygrove
Copy link
Contributor Author

build


// this check is quite broad and could potentially be refined to look for \W or \D
// immediately next to a line anchor
if (negatedWordOrDigit && endOfLineAnchor) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a follow-on PR #5610 to refine the approach taken here

@@ -74,6 +74,15 @@ class RegularExpressionTranspilerSuite extends FunSuite with Arm {
}
}

test("Detect unsupported combinations of line anchors and \\W and \\D") {
val patterns = Seq("\\W\\Z\\D", "\\W$", "$\\D")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These patterns were found during fuzz testing

@andygrove andygrove changed the title WIP: Regexp: Transpile \D, \W to Java's definitions Regexp: Transpile \D, \W to Java's definitions May 24, 2022
@andygrove andygrove marked this pull request as ready for review May 24, 2022 15:51
@andygrove andygrove requested a review from anthony-chang May 24, 2022 15:57
anthony-chang
anthony-chang previously approved these changes May 24, 2022
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit af865fd into NVIDIA:branch-22.06 May 27, 2022
@andygrove andygrove deleted the regexp-digit-word-upper branch May 27, 2022 04:40
@sameerz sameerz added this to the May 23 - Jun 3 milestone May 28, 2022
@sameerz sameerz added the feature request New feature or request label May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Regexp: Can we transpile \W and \D to Java's definition so we can support on GPU?
3 participants