[BUG] Split by regular expressions with ?
and *
repetition are not consistent with Spark
#4884
Labels
bug
Something isn't working
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
P2
Not required for release
Describe the bug
We currently fall back to CPU for repetition quantifiers ? and * with split because the behavior is not consistent with Spark.
Steps/Code to reproduce bug
Example:
For the input string
31313
and the pattern4?
,split
will produce['3','1','3','1','3']
on CPU, and['','3','1','3','1','3']
on the GPU.Expected behavior
The behavior should be consistent with Spark so we can enable this on GPU.
Also, see #4468 for related issue regarding
regexp_replace
The text was updated successfully, but these errors were encountered: