Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accommodate altered semantics of cudf::lists::contains() #4361

Merged

Conversation

mythrocks
Copy link
Collaborator

Depends on rapidsai/cudf/pull/9510.
Depends on rapidsai/cudf/pull/9901.

spark-rapids relies on libcudf's cudf::lists::contains() for its implementation of array_contains().

rapidsai/cudf/pull/9510 changes the semantics of lists::contains(), with regard to rows containing nulls. Specifically, if a list row contains at least one null element, and is found not to contain the search key, libcudf will now return false instead of null.
SparkSQL expects to return null in those cases.

This commit accommodates the change in libcudf's semantics, to keep its own existing behaviour.

@mythrocks mythrocks force-pushed the changed-lists-contains-semantics branch from cf03bf9 to aa97aa9 Compare December 14, 2021 23:14
@mythrocks mythrocks self-assigned this Dec 14, 2021
@mythrocks mythrocks requested a review from kuhushukla December 14, 2021 23:15
@mythrocks mythrocks marked this pull request as draft December 14, 2021 23:16
@mythrocks
Copy link
Collaborator Author

build

@mythrocks
Copy link
Collaborator Author

This PR is ready for review, although it's still in draft. It can't be merged until the dependencies in CUDF have been merged.

Signed-off-by: MithunR <mythrocks@gmail.com>
@mythrocks mythrocks force-pushed the changed-lists-contains-semantics branch from aa97aa9 to 2b18f80 Compare December 15, 2021 20:47
@mythrocks
Copy link
Collaborator Author

(Modified for changed CUDF function name. copyWithValidity -> copyWithBooleanColumnAsValidity.)

@mythrocks mythrocks added the bug Something isn't working label Dec 17, 2021
@mythrocks
Copy link
Collaborator Author

Re-tested after changes to dependencies in libcudf. Will rebuild after rapidsai/cudf/pull/9510 is merged.

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nits

@mythrocks mythrocks marked this pull request as ready for review December 20, 2021 16:38
@mythrocks
Copy link
Collaborator Author

rapidsai/cudf/pull/9510 was just committed. I've pulled this PR out of draft.

@revans2
Copy link
Collaborator

revans2 commented Dec 20, 2021

build

@revans2
Copy link
Collaborator

revans2 commented Dec 20, 2021

Even though rapidsai/cudf/pull/9510 has not gone all the way through CI yet. I think we can still test/commit this. Because the extra processing should be a noop if listContains is behaving the old way too.

@revans2
Copy link
Collaborator

revans2 commented Dec 20, 2021

oops I was wrong. It needs a new API that was missing before.

@tgravescs
Copy link
Collaborator

build

@mythrocks mythrocks removed the request for review from kuhushukla December 20, 2021 19:52
@mythrocks
Copy link
Collaborator Author

Merging this to head off impending CI breakage.

@mythrocks mythrocks merged commit 33de97c into NVIDIA:branch-22.02 Dec 20, 2021
@mythrocks
Copy link
Collaborator Author

Thank you for the review, @revans2. @tgravescs, thank you for confirming that the CI has passed with the new libcudf artifact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants