Optimize DECIMAL128 sum aggregations [databricks] #4688

jlowe · 2022-02-03T23:04:22Z

This accelerates sum aggregations on DECIMAL128 by splitting up the 128-bit values into 32-bit chunks, summing the chunks separately into 64-bit accumulated values, and then reassembling the 128-bit value from the accumulated chunks (with overflow checking). This changes what would normally force a cudf sort-based aggregation into one that can be hash-based which can significantly improve performance. This also allows us to remove some of the DECIMAL128 overflow code.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2022-02-04T16:20:22Z

build

revans2

The main thing I would like to see is follow on work to do the same kind of thing for average. We also should look at SUM for window operations. I don't think it will be needed there because the data comes in sorted, but some of the cleanup that has been done to split decimal from others would be good there too.

jlowe · 2022-02-04T21:15:01Z

build

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala

jlowe · 2022-02-08T16:11:52Z

Tracked down the CI failure to a libcudf issue with sort-based sum aggregations not performing aggregations in the result type as hash-based aggregations do. Filed rapidsai/cudf#10246. In the meantime, I'll update this to pre-cast the inputs to avoid the issue.

…ions

jlowe · 2022-02-08T17:55:29Z

@abellina your comments should now be addressed.

Note that this also includes a change to lower the batch size being used in hash_aggregate_tests for exercising out-of-core hash aggregate processing, as the 312db failure from the previous CI run was triggered by such processing. We were not seeing it in other CI runs because we weren't regularly exercising this code path, but we do with the new lower batch size value. This adds approx 5 minutes of test time on my desktop (with 4-way parallelism), but it seems worth it for the extra coverage.

jlowe · 2022-02-08T17:55:36Z

build

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala

abellina · 2022-02-08T18:04:36Z

One more nit on an override

jlowe · 2022-02-08T18:20:40Z

build

Optimize DECIMAL128 sum aggregations

29d2b6f

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe added performance A performance related task/issue cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Feb 3, 2022

jlowe added this to the Jan 31 - Feb 11 milestone Feb 3, 2022

jlowe self-assigned this Feb 3, 2022

revans2 previously approved these changes Feb 4, 2022

View reviewed changes

Fix regression in window sum

423da9b

jlowe dismissed revans2’s stale review via 423da9b February 4, 2022 21:14

jlowe marked this pull request as ready for review February 4, 2022 21:14

revans2 previously approved these changes Feb 4, 2022

View reviewed changes

abellina requested changes Feb 5, 2022

View reviewed changes

Update for review comments

498373e

This was referenced Feb 8, 2022

Optimize DECIMAL128 average aggregations #4722

Closed

Look into applying DECIMAL128 sum aggregation overflow checking for window aggregations #4723

Open

jlowe added 3 commits February 8, 2022 11:36

Explicitly upcast input to avoid libcudf sort-based aggregation issue

93245c8

Lower batch limit in agg tests to better exercise sort-based aggregat…

60593fb

…ions

Merge branch 'branch-22.04' into dec128-agg-perf

364fd58

jlowe dismissed revans2’s stale review via 364fd58 February 8, 2022 17:52

revans2 previously approved these changes Feb 8, 2022

View reviewed changes

abellina reviewed Feb 8, 2022

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala Outdated Show resolved Hide resolved

Remove redundant method override

bf22a3c

jlowe dismissed revans2’s stale review via bf22a3c February 8, 2022 18:20

abellina approved these changes Feb 8, 2022

View reviewed changes

jlowe merged commit f3a5cd3 into NVIDIA:branch-22.04 Feb 8, 2022

jlowe deleted the dec128-agg-perf branch February 8, 2022 22:17

This was referenced Feb 8, 2022

[BUG] Investigate q32 and q67 for decimals potential regression #4290

Closed

Improve aggregation performance of average on DECIMAL128 columns [databricks] #4776

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DECIMAL128 sum aggregations [databricks] #4688

Optimize DECIMAL128 sum aggregations [databricks] #4688

jlowe commented Feb 3, 2022

jlowe commented Feb 4, 2022

revans2 left a comment

jlowe commented Feb 4, 2022

jlowe commented Feb 8, 2022

jlowe commented Feb 8, 2022

jlowe commented Feb 8, 2022

abellina commented Feb 8, 2022

jlowe commented Feb 8, 2022

Optimize DECIMAL128 sum aggregations [databricks] #4688

Optimize DECIMAL128 sum aggregations [databricks] #4688

Conversation

jlowe commented Feb 3, 2022

jlowe commented Feb 4, 2022

revans2 left a comment

Choose a reason for hiding this comment

jlowe commented Feb 4, 2022

jlowe commented Feb 8, 2022

jlowe commented Feb 8, 2022

jlowe commented Feb 8, 2022

abellina commented Feb 8, 2022

jlowe commented Feb 8, 2022