Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] string_split does not respect spark.rapids.sql.regexp.enabled config #5130

Closed
andygrove opened this issue Apr 1, 2022 · 0 comments · Fixed by #5297
Closed

[BUG] string_split does not respect spark.rapids.sql.regexp.enabled config #5130

andygrove opened this issue Apr 1, 2022 · 0 comments · Fixed by #5297
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Contributor

andygrove commented Apr 1, 2022

Describe the bug
string_split does not respect the spark.rapids.sql.regexp.enabled configuration and will execute regular expressions on the GPU when this config is set to false.

Note that we should continue to ignore the config flag in the case where the delimiter can be transpiled to a simple string.

Steps/Code to reproduce bug

scala> spark.conf.set("spark.rapids.sql.regexp.enabled", "false")

scala> val df = Seq("hello", "goodbyte").toDF("a").repartition(2)

scala> df.createTempView("t")

scala> spark.sql("SELECT split(a, '[eh]') FROM t").show
22/04/01 23:03:54 WARN GpuOverrides: 
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
  @Partitioning <SinglePartition$> could run on GPU
  *Exec <ProjectExec> will run on GPU
    *Expression <Alias> cast(split(a#4, [eh], -1) as string) AS split(a, [eh], -1)#44 will run on GPU
      *Expression <Cast> cast(split(a#4, [eh], -1) as string) will run on GPU
        *Expression <StringSplit> split(a#4, [eh], -1) will run on GPU
    *Exec <ShuffleExchangeExec> will run on GPU
      *Partitioning <RoundRobinPartitioning> will run on GPU
      ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
        @Expression <AttributeReference> a#4 could run on GPU

+------------------+
|split(a, [eh], -1)|
+------------------+
|         [, , llo]|
|       [goodbyt, ]|
+------------------+

Expected behavior
Should fall back to CPU if delimiter is a regular expression and when regexp is disabled.

Environment details (please complete the following information)
N/A

Additional context
None

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 1, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants