You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug string_split does not respect the spark.rapids.sql.regexp.enabled configuration and will execute regular expressions on the GPU when this config is set to false.
Note that we should continue to ignore the config flag in the case where the delimiter can be transpiled to a simple string.
Steps/Code to reproduce bug
scala> spark.conf.set("spark.rapids.sql.regexp.enabled", "false")
scala>valdf=Seq("hello", "goodbyte").toDF("a").repartition(2)
scala> df.createTempView("t")
scala> spark.sql("SELECT split(a, '[eh]') FROM t").show
22/04/0123:03:54WARNGpuOverrides:!Exec <CollectLimitExec> cannot run on GPU because the ExecCollectLimitExec has been disabled, and is disabled by default because CollectLimit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to trueif you wish to enable it
@Partitioning <SinglePartition$> could run on GPU*Exec <ProjectExec> will run on GPU*Expression <Alias> cast(split(a#4, [eh], -1) as string) AS split(a, [eh], -1)#44 will run on GPU*Expression <Cast> cast(split(a#4, [eh], -1) as string) will run on GPU*Expression <StringSplit> split(a#4, [eh], -1) will run on GPU*Exec <ShuffleExchangeExec> will run on GPU*Partitioning <RoundRobinPartitioning> will run on GPU! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator classorg.apache.spark.sql.execution.LocalTableScanExec@Expression <AttributeReference> a#4 could run on GPU+------------------+|split(a, [eh], -1)|+------------------+| [, , llo]|| [goodbyt, ]|+------------------+
Expected behavior
Should fall back to CPU if delimiter is a regular expression and when regexp is disabled.
Environment details (please complete the following information)
N/A
Additional context
None
The text was updated successfully, but these errors were encountered:
Describe the bug
string_split
does not respect thespark.rapids.sql.regexp.enabled
configuration and will execute regular expressions on the GPU when this config is set to false.Note that we should continue to ignore the config flag in the case where the delimiter can be transpiled to a simple string.
Steps/Code to reproduce bug
Expected behavior
Should fall back to CPU if delimiter is a regular expression and when regexp is disabled.
Environment details (please complete the following information)
N/A
Additional context
None
The text was updated successfully, but these errors were encountered: