Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow *, ?, and {0,...} variants in StringSplit in non-empty match situations #6959

Merged
merged 2 commits into from
Nov 3, 2022

Conversation

NVnavkumar
Copy link
Collaborator

Fixes #4884.

This enables *, ? as well as {0,} and {0,n} in regular expressions that used in StringSplit in most circumstances aside from zero-width match cases (covered by #6958) to run on the GPU.

Because zero-width matches are a somewhat rare edge case that need to be handled on the output side, in favor of usability, this PR reduces that to be a separate edge case that can fallback on its own.

…t * and ? can still be used in other circumstances

Signed-off-by: Navin Kumar <navink@nvidia.com>
…match

Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar NVnavkumar requested a review from andygrove October 31, 2022 22:19
@NVnavkumar NVnavkumar self-assigned this Oct 31, 2022
@NVnavkumar
Copy link
Collaborator Author

build

Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NVnavkumar NVnavkumar merged commit 206a9e4 into NVIDIA:branch-22.12 Nov 3, 2022
@sameerz sameerz added the task Work required that improves the product but is not user facing label Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Split by regular expressions with ? and * repetition are not consistent with Spark
3 participants