Insert buffer converters for TypedImperativeAggregate #3299

sperlingxx · 2021-08-25T11:52:40Z

Current PR addresses the last task of #2916, supporting the aggregation buffer conversion between CPU and GPU format for TypedImperativeAggregate functions. With the ability to insert buffer converters, we can handle TypedImperativeAggregate functions running across CPU and CPU in runtime. It indicates that we don't need to fallback the entire Aggregate stack to CPU once one stage need to fallback when the Aggregate contains TypedImperativeAggregate functions.

The general idea is to create buffer converters and bind them to certain physical plans during preColumnarTransitions. And integrate these buffer converters into RowToColumnar/ColumnarToRow transitions as pre-processing/post-processing projections during postColumnarTransitions. And this idea works even when AQE is on. To adapt AQE, we leverage TreeNodeTag to cache temporary information, including: buffer converters, and some meta data.

For better understanding, let's walk through the entire procedure of inserting buffer converters:

binding buffer converters to certain plans (preColumnarTransitions)
1.1 collecting all stages of Aggregation which contains TypedImperativeAggregate functions
The binding procedure is triggered in GpuTypedImperativeSupportedAggregateExecMeta.tagPlanForGpu if wrapped plan
is the final stage. At first, we collect all stages as what we do for associated fallback.
1.2 filtering stages who need buffer converters to fill the data gap with their child
1.3 creating buffer converters with filtered stages
We add two new interfaces on GpuTypedImperativeSupportedAggregateExecMeta: createCpuToGpuBufferConverter and createGpuToCpuBufferConverter.
1.4 binding buffer converters into certain plans
The plans who carry the buffer converters are the CPU plans (can not be replaced) located right before/after the potential R2C/C2R transitions.
materializing RowToColumnar/ColumnarToRow transitions with buffer converters (postColumnarTransitions)
We add extra field preTransitions/postTransitions to GpuRowToColumnar/GpuColumnarToRow, in order to insert projections like buffer converters for TypedImperativeAggregate.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

…ask4

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-08-25T11:54:01Z

build

revans2

I am a little concerned that this code appears to assume AQE is on all the time. It also looks like we cannotl have a TypeImerativeAggregation that does not use transitions. For percentile approximate it is going to use a very different algorithm and only supporting transitions is going to be very hard. I would prefer to keep the old code if we have to pick just one way to do it.

shims/spark304/src/main/scala/com/nvidia/spark/rapids/shims/spark304/SparkBaseShims.scala

shims/spark311/src/main/scala/com/nvidia/spark/rapids/shims/spark311/Spark311Shims.scala

shims/spark311cdh/src/main/scala/com/nvidia/spark/rapids/shims/spark311cdh/SparkBaseShims.scala

shims/spark313/src/main/scala/com/nvidia/spark/rapids/shims/spark313/SparkBaseShims.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuColumnarToRowExec.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-08-26T10:54:57Z

build

sperlingxx · 2021-08-26T10:59:34Z

I am a little concerned that this code appears to assume AQE is on all the time. It also looks like we cannotl have a TypeImerativeAggregation that does not use transitions. For percentile approximate it is going to use a very different algorithm and only supporting transitions is going to be very hard. I would prefer to keep the old code if we have to pick just one way to do it.

I brought back the old codes for "associated fallback". After that, we will judge whether all TypedImperativeAggregate buffers across CPU and GPU are available to be converted in runtime. If so, we insert buffer converters. Otherwise, we just fall back the entire Aggregation stack.

revans2

It looks good. I just have a few nits, but I am happy to let it in without any changes

.../spark311cdh/src/main/scala/com/nvidia/spark/rapids/shims/spark311cdh/Spark311CDHShims.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala

…ask4

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx · 2021-08-27T05:53:49Z

build

tgravescs · 2021-08-30T13:26:59Z

it looks like this is missing the databricks shim updates for GpuColumnarToRowTransitionExec. If you are touching all the shims please think about the databricks ones as well.

tgravescs · 2021-08-30T13:27:14Z

I'll put up pr to fix

tgravescs · 2021-08-30T13:38:55Z

Sorry it looks like this may have just been in my branch with various build changes, apologize

sperlingxx added 7 commits August 20, 2021 10:31

initialize

a0362a0

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

update

fdf1bfc

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

cache

878b019

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

update

55b306f

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

Merge remote-tracking branch 'origin/branch-21.10' into collect_ops_t…

a7c24fe

…ask4

fix shims

6728f97

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

refine

8cc2997

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested review from andygrove and revans2 August 25, 2021 11:52

revans2 requested changes Aug 25, 2021

View reviewed changes

sameerz added the task Work required that improves the product but is not user facing label Aug 25, 2021

update

ba93e7d

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested a review from revans2 August 26, 2021 11:00

revans2 previously approved these changes Aug 26, 2021

View reviewed changes

sperlingxx added 2 commits August 27, 2021 13:07

Merge remote-tracking branch 'origin/branch-21.10' into collect_ops_t…

07ba094

…ask4

refine

3766c0e

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx dismissed revans2’s stale review via 3766c0e August 27, 2021 05:53

revans2 approved these changes Aug 27, 2021

View reviewed changes

sperlingxx merged commit 5290586 into NVIDIA:branch-21.10 Aug 30, 2021

sperlingxx deleted the collect_ops_task4 branch August 30, 2021 02:02

sperlingxx mentioned this pull request Aug 30, 2021

[FEA] Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate #2916

Closed

tgravescs mentioned this pull request Aug 30, 2021

[BUG] Databricks test fails test_hash_groupby_collect_partial_replace_fallback #3339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert buffer converters for TypedImperativeAggregate #3299

Insert buffer converters for TypedImperativeAggregate #3299

sperlingxx commented Aug 25, 2021 •

edited

Loading

sperlingxx commented Aug 25, 2021

revans2 left a comment

sperlingxx commented Aug 26, 2021

sperlingxx commented Aug 26, 2021

revans2 left a comment

sperlingxx commented Aug 27, 2021

tgravescs commented Aug 30, 2021

tgravescs commented Aug 30, 2021

tgravescs commented Aug 30, 2021

Insert buffer converters for TypedImperativeAggregate #3299

Insert buffer converters for TypedImperativeAggregate #3299

Conversation

sperlingxx commented Aug 25, 2021 • edited Loading

sperlingxx commented Aug 25, 2021

revans2 left a comment

Choose a reason for hiding this comment

sperlingxx commented Aug 26, 2021

sperlingxx commented Aug 26, 2021

revans2 left a comment

Choose a reason for hiding this comment

sperlingxx commented Aug 27, 2021

tgravescs commented Aug 30, 2021

tgravescs commented Aug 30, 2021

tgravescs commented Aug 30, 2021

sperlingxx commented Aug 25, 2021 •

edited

Loading