Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for CreateMap on GPU #3230

Merged
merged 4 commits into from
Aug 19, 2021

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Aug 13, 2021

Signed-off-by: Andy Grove andygrove@nvidia.com

Closes #3014

There is a follow-on issue #3229 for supporting multiple key-value pairs.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove added this to the Aug 2 - Aug 13 milestone Aug 13, 2021
@andygrove andygrove self-assigned this Aug 13, 2021
@andygrove andygrove added the feature request New feature or request label Aug 13, 2021
@andygrove andygrove changed the title Initial support for CreateMap on GPU WIP: Initial support for CreateMap on GPU Aug 13, 2021
@andygrove andygrove marked this pull request as draft August 13, 2021 18:06
integration_tests/src/main/python/map_test.py Outdated Show resolved Hide resolved
data_gen = [('a', StringGen(nullable=False)), ('b', StringGen(nullable=False))]
assert_gpu_fallback_collect(
lambda spark : gen_df(spark, data_gen).selectExpr(
"map(a, b, b, a) as m1"), 'ProjectExec')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if "a == b" for a given row?

Spark has a config for what should happen.

https://github.com/apache/spark/blob/f620996142ba312f7e52f75476b1b18be667ffdf/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2969-L2982

Sadly neither of those we are going to be able to support out of the box. I think we need to file a follow on issue so that we can figure out how to dedupe maps the proper way, with cudf's help. In the short term we might need to have a config indicating that it is okay to enable this because we are not doing duplicate checks, but if the keys are all scalar values we can determine duplicates up front and avoid the entire issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3229 is the follow-on issue for supporting multiple key-value pairs and where we would need to tackle the duplicate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a separate issue after that or as a part of it. I think most of the time people will use literal values for the keys. If that is the case, then we can still enable this for those cases without much difficulty, but we would need to fall back to the CPU for other cases, or have a config to say I know what I am doing.

integration_tests/src/main/python/map_test.py Show resolved Hide resolved
@andygrove andygrove changed the title WIP: Initial support for CreateMap on GPU Initial support for CreateMap on GPU Aug 18, 2021
@andygrove andygrove marked this pull request as ready for review August 18, 2021 15:23
revans2
revans2 previously approved these changes Aug 18, 2021
@andygrove
Copy link
Contributor Author

build

@revans2
Copy link
Collaborator

revans2 commented Aug 18, 2021

build

@revans2 revans2 merged commit f0f31d5 into NVIDIA:branch-21.10 Aug 19, 2021
razajafri pushed a commit to razajafri/spark-rapids that referenced this pull request Aug 23, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@andygrove andygrove deleted the gpu-create-map-2expr branch November 30, 2021 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add initial support for CreateMap
3 participants