Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for regexp_extract on the GPU #4285

Merged
merged 9 commits into from
Dec 13, 2021

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Dec 3, 2021

Signed-off-by: Andy Grove andygrove@nvidia.com

Closes #4002 and #4284

This PR adds support for regexp_extract.

@andygrove andygrove added the feature request New feature or request label Dec 3, 2021
@andygrove andygrove added this to the Nov 30 - Dec 10 milestone Dec 3, 2021
@andygrove andygrove self-assigned this Dec 3, 2021
@andygrove
Copy link
Contributor Author

build

Comment on lines -511 to -479
- `$` does not match the end of a string if the string ends with a line-terminator
([cuDF issue #9620](https://github.com/rapidsai/cudf/issues/9620))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was resolved in #4239 but the docs did not get updated. I can raise a separate PR for this change if necessary.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove marked this pull request as draft December 3, 2021 19:12
@andygrove andygrove changed the title Add support for regexp_extract on the GPU WIP: Add support for regexp_extract on the GPU Dec 3, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove changed the title WIP: Add support for regexp_extract on the GPU Add support for regexp_extract on the GPU Dec 6, 2021
@andygrove andygrove marked this pull request as ready for review December 6, 2021 19:25
withResource(str.getBase.extractRe(cudfRegexPattern)) { extract =>
withResource(str.getBase.matchesRe(cudfRegexPattern)) { matches =>
withResource(str.getBase.isNull) { isNull =>
withResource(extract.getColumn(i - 1)) { extractedGroup =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could just ask cuDF to extract a single column and I have filed rapidsai/cudf#9855 to request that capability

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove marked this pull request as draft December 10, 2021 16:12
@andygrove andygrove changed the title Add support for regexp_extract on the GPU WIP: Add support for regexp_extract on the GPU Dec 10, 2021
@andygrove andygrove changed the title WIP: Add support for regexp_extract on the GPU Add support for regexp_extract on the GPU Dec 10, 2021
@andygrove andygrove marked this pull request as ready for review December 10, 2021 18:36
jlowe
jlowe previously approved these changes Dec 10, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just a small comment on a test.

integration_tests/src/main/python/string_test.py Outdated Show resolved Hide resolved
@sameerz sameerz removed this from the Nov 30 - Dec 10 milestone Dec 10, 2021
@sameerz sameerz added this to the Dec 13 - Jan 7 milestone Dec 10, 2021
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 49c36ea into NVIDIA:branch-22.02 Dec 13, 2021
@andygrove andygrove deleted the regexp-extract branch December 13, 2021 15:45
@andygrove andygrove linked an issue Dec 13, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support idx = 0 in GpuRegExpExtract [FEA] Implement regexp_extract on GPU
3 participants