-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Allow to run groupby/reduction with externally derived aggregations #16633
Labels
feature request
New feature or request
Comments
We actually have precedence for custom UDF aggregation cudf/cpp/include/cudf/aggregation.hpp Lines 588 to 600 in bf2ee32
Example usage in rolling here: cudf/cpp/tests/rolling/grouped_rolling_test.cpp Lines 207 to 208 in bf2ee32
|
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 20, 2024
This implements `HOST_UDF` aggregation, allowing to execute a host-side user-defined function (UDF) through libcudf aggregation framework. * A host-side function can be an arbitrarily independent function running on the host machine. It may or may not call other device kernels depending on its implementation. * Such user-defined function must follow the libcudf provided interface (`cudf::host_udf_base`). The interface provides the ability to fully interact with libcudf aggregation framework. * Since it is implemented on the user application side, it has a very high degree of freedom to perform arbitrary operations to satisfy the user's need. Partially contributes to #16633. --- Usage 1. Define a functor deriving from `cudf::host_udf_base` and implement the required virtual functions declared in that base struct. For example: ``` struct my_aggregation : cudf::host_udf_base { ... }; ``` 2. Create an instance of libcudf `HOST_UDF` aggregation which is constructed from an instance of the functor defined above. For example: ``` auto agg = cudf::make_host_udf_aggregation<cudf::groupby_aggregation>( std::make_unique<my_aggregation>()); ``` 3. Perform aggregation operation on the created instance. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Chong Gao (https://github.com/res-life) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #17592
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This idea arose after many times trying to add new aggregations into the libcudf framework to accommodate specific use cases outside of cudf. However, most of the time, the application (Spark plugin) wants very special behaviors that cannot be accommodated. For example, for
M2
/MERGE_M2
aggregations, we want to output more than just one columns (the mainM2
values as well as their intermediate values) for reuse somewhere else.I would like to refactor the grouby/reduction framework such that it allows runing on aggregations extended outside of libcudf. By doing so, the downstream applications can implement any new, customized aggregations they want and call libcudf code on them. The outside aggregations just need to be implemented from classes derived from cudf base classes (
cudf::groupby_aggregation
for example).Allowing extension like this would be very beneficial in the long term, allowing any downstream application to accommodate their needs and maximize performance gain. That would also help reduce maintenance efforts in the libcudf repository.
The text was updated successfully, but these errors were encountered: