-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for CreateMap on GPU #3230
Conversation
Signed-off-by: Andy Grove <andygrove@nvidia.com>
data_gen = [('a', StringGen(nullable=False)), ('b', StringGen(nullable=False))] | ||
assert_gpu_fallback_collect( | ||
lambda spark : gen_df(spark, data_gen).selectExpr( | ||
"map(a, b, b, a) as m1"), 'ProjectExec') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if "a == b" for a given row?
Spark has a config for what should happen.
Sadly neither of those we are going to be able to support out of the box. I think we need to file a follow on issue so that we can figure out how to dedupe maps the proper way, with cudf's help. In the short term we might need to have a config indicating that it is okay to enable this because we are not doing duplicate checks, but if the keys are all scalar values we can determine duplicates up front and avoid the entire issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3229 is the follow-on issue for supporting multiple key-value pairs and where we would need to tackle the duplicate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a separate issue after that or as a part of it. I think most of the time people will use literal values for the keys. If that is the case, then we can still enable this for those cases without much difficulty, but we would need to fall back to the CPU for other cases, or have a config to say I know what I am doing.
build |
build |
Signed-off-by: Andy Grove <andygrove@nvidia.com> Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Andy Grove andygrove@nvidia.com
Closes #3014
There is a follow-on issue #3229 for supporting multiple key-value pairs.