Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DECIMAL128 sum aggregations [databricks] #4688

Merged
merged 7 commits into from
Feb 8, 2022

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Feb 3, 2022

Depends on rapidsai/cudf#10201.

This accelerates sum aggregations on DECIMAL128 by splitting up the 128-bit values into 32-bit chunks, summing the chunks separately into 64-bit accumulated values, and then reassembling the 128-bit value from the accumulated chunks (with overflow checking). This changes what would normally force a cudf sort-based aggregation into one that can be hash-based which can significantly improve performance. This also allows us to remove some of the DECIMAL128 overflow code.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added performance A performance related task/issue cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Feb 3, 2022
@jlowe jlowe added this to the Jan 31 - Feb 11 milestone Feb 3, 2022
@jlowe jlowe self-assigned this Feb 3, 2022
@jlowe
Copy link
Member Author

jlowe commented Feb 4, 2022

build

revans2
revans2 previously approved these changes Feb 4, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I would like to see is follow on work to do the same kind of thing for average. We also should look at SUM for window operations. I don't think it will be needed there because the data comes in sorted, but some of the cleanup that has been done to split decimal from others would be good there too.

@jlowe jlowe marked this pull request as ready for review February 4, 2022 21:14
@jlowe
Copy link
Member Author

jlowe commented Feb 4, 2022

build

revans2
revans2 previously approved these changes Feb 4, 2022
@jlowe
Copy link
Member Author

jlowe commented Feb 8, 2022

Tracked down the CI failure to a libcudf issue with sort-based sum aggregations not performing aggregations in the result type as hash-based aggregations do. Filed rapidsai/cudf#10246. In the meantime, I'll update this to pre-cast the inputs to avoid the issue.

@jlowe
Copy link
Member Author

jlowe commented Feb 8, 2022

@abellina your comments should now be addressed.

Note that this also includes a change to lower the batch size being used in hash_aggregate_tests for exercising out-of-core hash aggregate processing, as the 312db failure from the previous CI run was triggered by such processing. We were not seeing it in other CI runs because we weren't regularly exercising this code path, but we do with the new lower batch size value. This adds approx 5 minutes of test time on my desktop (with 4-way parallelism), but it seems worth it for the extra coverage.

@jlowe
Copy link
Member Author

jlowe commented Feb 8, 2022

build

revans2
revans2 previously approved these changes Feb 8, 2022
@abellina
Copy link
Collaborator

abellina commented Feb 8, 2022

One more nit on an override

@jlowe
Copy link
Member Author

jlowe commented Feb 8, 2022

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants