-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DECIMAL128 sum aggregations [databricks] #4688
Conversation
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I would like to see is follow on work to do the same kind of thing for average. We also should look at SUM for window operations. I don't think it will be needed there because the data comes in sorted, but some of the cleanup that has been done to split decimal from others would be good there too.
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
Tracked down the CI failure to a libcudf issue with sort-based sum aggregations not performing aggregations in the result type as hash-based aggregations do. Filed rapidsai/cudf#10246. In the meantime, I'll update this to pre-cast the inputs to avoid the issue. |
@abellina your comments should now be addressed. Note that this also includes a change to lower the batch size being used in hash_aggregate_tests for exercising out-of-core hash aggregate processing, as the 312db failure from the previous CI run was triggered by such processing. We were not seeing it in other CI runs because we weren't regularly exercising this code path, but we do with the new lower batch size value. This adds approx 5 minutes of test time on my desktop (with 4-way parallelism), but it seems worth it for the extra coverage. |
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
One more nit on an override |
build |
Depends on rapidsai/cudf#10201.
This accelerates sum aggregations on DECIMAL128 by splitting up the 128-bit values into 32-bit chunks, summing the chunks separately into 64-bit accumulated values, and then reassembling the 128-bit value from the accumulated chunks (with overflow checking). This changes what would normally force a cudf sort-based aggregation into one that can be hash-based which can significantly improve performance. This also allows us to remove some of the DECIMAL128 overflow code.