-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix min/max inclusive cudf::scan for strings column #8705
Fix min/max inclusive cudf::scan for strings column #8705
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8705 +/- ##
===============================================
Coverage ? 10.61%
===============================================
Files ? 109
Lines ? 18302
Branches ? 0
===============================================
Hits ? 1943
Misses ? 16359
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixed the issues I was seeing in the Spark code I was working on.
@gpucibot merge |
Closes #8684
A bug in
thrust::inclusive_scan
reported here is passing invalid data to the AssociateOperator parameter provided bycudf::detail::inclusive_scan
. The invalid data is likely initialized memory used by CUDA blocks/threads where the result is ignored. For regular fixed-width and primitive types this is harmless since operations just produce invalid results which are not used. Unfortunately, forstring_view
objects, this invalid data will cause a crash since it normally requires de-referencing a device-memory pointer.This PR works around the issue by creating a custom scan-strings-operator wrapper for inclusive-scan. The operator accepts index values that are checked and used to access the individual rows. The underlying operator is then called to determine which index is returned. The end result is a vector of indices which is passed to a gather call to build the output column.
Also, added some additional scan gtests with 512 strings as described in issue #8684