-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Replace numpy mod op in Series.hash_encode
#935
Comments
I will work on a fix for this. |
This will be easier than expected. We'll change the line to
|
That will still run it on the CPU and trigger a device to host copy on the |
I don't recall if we have a modulo binary op in libcudf that you'd probably want here, @jrhemstad? |
Good point. But the speed up is probably good enough for now. wdyt? |
Implementing a binary modulo op on the GPU should be trivial. If the arrays are larger or we call this on a bunch of columns the allocations / memory copies would become expensive. |
So you want a binary op of doing a modulo of a column by a scalar? This will be added in #892 |
@jrhemstad yes. Will this be added by the 0.6 release? If not might want to do an intermediate fix. |
@cmgreen210 as a stop gap it should be pretty straightforward to write a numba kernel that does a modulo operator until we move it into libcudf. |
@kkraus14 Sounds good. I'll have that up asap. |
The modulo implementation is now here. It's not merged, but it's tested to work. |
The current implementation of
hash_encode
is bottlenecked at the numpy call https://github.com/rapidsai/cudf/blob/branch-0.6/python/cudf/dataframe/series.py#L1078. This computation should be moved to the gpu.The text was updated successfully, but these errors were encountered: