Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix min/max inclusive cudf::scan for strings column #8705

Merged
merged 4 commits into from
Jul 12, 2021

Conversation

davidwendt
Copy link
Contributor

Closes #8684

A bug in thrust::inclusive_scan reported here is passing invalid data to the AssociateOperator parameter provided by cudf::detail::inclusive_scan. The invalid data is likely initialized memory used by CUDA blocks/threads where the result is ignored. For regular fixed-width and primitive types this is harmless since operations just produce invalid results which are not used. Unfortunately, for string_view objects, this invalid data will cause a crash since it normally requires de-referencing a device-memory pointer.

This PR works around the issue by creating a custom scan-strings-operator wrapper for inclusive-scan. The operator accepts index values that are checked and used to access the individual rows. The underlying operator is then called to determine which index is returned. The end result is a vector of indices which is passed to a gather call to build the output column.

Also, added some additional scan gtests with 512 strings as described in issue #8684

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jul 9, 2021
@davidwendt davidwendt self-assigned this Jul 9, 2021
@davidwendt davidwendt requested a review from a team as a code owner July 9, 2021 12:46
@davidwendt davidwendt requested review from karthikeyann, ttnghia and harrism and removed request for ttnghia July 9, 2021 12:46
@codecov
Copy link

codecov bot commented Jul 9, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@214d74a). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 2d6b9cc differs from pull request most recent head d18bca9. Consider uploading reports for the commit d18bca9 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8705   +/-   ##
===============================================
  Coverage                ?   10.61%           
===============================================
  Files                   ?      109           
  Lines                   ?    18302           
  Branches                ?        0           
===============================================
  Hits                    ?     1943           
  Misses                  ?    16359           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 214d74a...d18bca9. Read the comment docs.

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed the issues I was seeing in the Spark code I was working on.

@harrism harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jul 12, 2021
@harrism
Copy link
Member

harrism commented Jul 12, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 05fd176 into rapidsai:branch-21.08 Jul 12, 2021
@davidwendt davidwendt deleted the bug-min-max-scan-strings branch July 19, 2021 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cudaErrorIllegalAddress when doing a min or max scan on large string columns
4 participants