Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateMap support for multiple key-value pairs #3251

Merged
merged 12 commits into from
Aug 25, 2021
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Name | Description | Default Value
<a name="sql.castStringToFloat.enabled"></a>spark.rapids.sql.castStringToFloat.enabled|When set to true, enables casting from strings to float types (float, double) on the GPU. Currently hex values aren't supported on the GPU. Also note that casting from string to float types on the GPU returns incorrect results when the string represents any number "1.7976931348623158E308" <= x < "1.7976931348623159E308" and "-1.7976931348623158E308" >= x > "-1.7976931348623159E308" in both these cases the GPU returns Double.MaxValue while CPU returns "+Infinity" and "-Infinity" respectively|false
<a name="sql.castStringToTimestamp.enabled"></a>spark.rapids.sql.castStringToTimestamp.enabled|When set to true, casting from string to timestamp is supported on the GPU. The GPU only supports a subset of formats when casting strings to timestamps. Refer to the CAST documentation for more details.|false
<a name="sql.concurrentGpuTasks"></a>spark.rapids.sql.concurrentGpuTasks|Set the number of tasks that can execute concurrently per GPU. Tasks may temporarily block when the number of concurrent tasks in the executor exceeds this amount. Allowing too many concurrent tasks on the same GPU may lead to GPU out of memory errors.|1
<a name="sql.createMap.enabled"></a>spark.rapids.sql.createMap.enabled|When set to true, support the CreateMap expression on the GPU with multiple key-value pairs where the keys are not literal values. The GPU version does not detect duplicate keys or make any guarantees about which key wins if there are duplicates in this case. CreateMap is always supported on the GPU when there is a single key-value pair or when there are multiple key-value pairs with literal keys.|false
<a name="sql.createMap.enabled"></a>spark.rapids.sql.createMap.enabled|The GPU-enabled version of the `CreateMap` expression (`map` SQL function) does not detect duplicate keys in all cases and does not guarantee which key wins if there are duplicates. When this config is set to true, `CreateMap` will be enabled to run on the GPU even when there might be duplicate keys.|false
<a name="sql.csv.read.bool.enabled"></a>spark.rapids.sql.csv.read.bool.enabled|Parsing an invalid CSV boolean value produces true instead of null|false
<a name="sql.csv.read.byte.enabled"></a>spark.rapids.sql.csv.read.byte.enabled|Parsing CSV bytes is much more lenient and will return 0 for some malformed values instead of null|false
<a name="sql.csv.read.date.enabled"></a>spark.rapids.sql.csv.read.date.enabled|Parsing invalid CSV dates produces different results from Spark|false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -638,11 +638,10 @@ object RapidsConf {
.createWithDefault(false)

val ENABLE_CREATE_MAP = conf("spark.rapids.sql.createMap.enabled")
.doc("When set to true, support the CreateMap expression on the GPU with multiple " +
"key-value pairs where the keys are not literal values. The GPU version does not detect " +
"duplicate keys or make any guarantees about which key wins if there are duplicates in " +
"this case. CreateMap is always supported on the GPU when there is a single key-value " +
"pair or when there are multiple key-value pairs with literal keys.")
.doc("The GPU-enabled version of the `CreateMap` expression (`map` SQL function) does not " +
"detect duplicate keys in all cases and does not guarantee which key wins if there are " +
"duplicates. When this config is set to true, `CreateMap` will be enabled to run on the " +
"GPU even when there might be duplicate keys.")
.booleanConf
.createWithDefault(false)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1112,7 +1112,8 @@ object CreateMapCheck extends ExprChecks {
if (meta.childExprs.length > 2) {
// check for duplicate keys if the keys are literal values
val keyExprs = meta.childExprs.indices.filter(_ % 2 == 0).map(meta.childExprs)
if (keyExprs.forall(_.wrapped.isInstanceOf[Literal])) {
if (keyExprs.forall(e => GpuOverrides.extractLit(
e.wrapped.asInstanceOf[Expression]).isDefined)) {
val keys = keyExprs.map(_.wrapped.asInstanceOf[Literal].value)
revans2 marked this conversation as resolved.
Show resolved Hide resolved
val uniqueKeys = new mutable.HashSet[Any]()
for (key <- keys) {
Expand Down