Fix canonicalization of GpuScalarSubquery #3471

revans2 · 2021-09-13T21:22:13Z

This fixes #3400

I will file some follow on issues for a few other things I found while doing this.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2021-09-13T21:32:53Z

build

sameerz · 2021-09-14T00:00:54Z

build

pxLi · 2021-09-14T00:56:02Z

build

pxLi · 2021-09-14T01:20:21Z

UT failed

[2021-09-14T01:16:57.968Z] ScalarSubquerySuite:

[2021-09-14T01:16:58.895Z] - WITH DECIMALS: Uncorrelated Scalar Subquery *** FAILED ***

[2021-09-14T01:16:58.895Z]   canonicalizationMatchesCpu=false != canonicalizationMatchesGpu=true

[2021-09-14T01:16:58.895Z]   CPU plan: *(1) Project [none#0L, Subquery scalar-subquery#42254, [id=#115307] AS #0L]

[2021-09-14T01:16:58.895Z]   :  +- Subquery scalar-subquery#42254, [id=#115307]

[2021-09-14T01:16:58.895Z]   :     +- *(2) HashAggregate(keys=[], functions=[max(more_longs#42249L)], output=[max(more_longs)#42256L])

[2021-09-14T01:16:58.895Z]   :        +- Exchange SinglePartition, true, [id=#115303]

[2021-09-14T01:16:58.895Z]   :           +- *(1) HashAggregate(keys=[], functions=[partial_max(more_longs#42249L)], output=[max#42261L])

[2021-09-14T01:16:58.895Z]   :              +- FileScan csv [more_longs#42249L] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<more_longs:bigint>

[2021-09-14T01:16:58.895Z]   +- FileScan csv [none#0L] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<longs:bigint>

[2021-09-14T01:16:58.895Z]   

[2021-09-14T01:16:58.895Z]   GPU plan: GpuColumnarToRow false

[2021-09-14T01:16:58.895Z]   +- GpuProject [none#0L, Subquery Subquery, [id=#115398] AS #0L]

[2021-09-14T01:16:58.895Z]      :  +- Subquery Subquery, [id=#115398]

[2021-09-14T01:16:58.895Z]      :     +- GpuColumnarToRow false

[2021-09-14T01:16:58.895Z]      :        +- GpuHashAggregate(keys=[], functions=[gpumax(none#0L)], output=[#0L])

[2021-09-14T01:16:58.895Z]      :           +- GpuShuffleCoalesce 2147483647

[2021-09-14T01:16:58.895Z]      :              +- GpuColumnarExchange gpusinglepartitioning$(), false, [id=#115384]

[2021-09-14T01:16:58.895Z]      :                 +- GpuHashAggregate(keys=[], functions=[partial_gpumax(none#0L)], output=[none#0L])

[2021-09-14T01:16:58.895Z]      :                    +- GpuFileGpuScan csv [none#0L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<more_longs:bigint>

[2021-09-14T01:16:58.895Z]      +- GpuFileGpuScan csv [none#0L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<longs:bigint> (SparkQueryCompareTestSuite.scala:367)

revans2 · 2021-09-14T13:10:25Z

build

revans2 · 2021-09-14T13:12:49Z

To me this feels like a test issue. Spark fixed canonicalization on ScalarSubquery for 3.2, and now we have issues where we can cannonicalize it, but the CPU cannot for older versions of Spark. I think I will keep it how it is and update the test unless someone else disagrees.

revans2 · 2021-09-14T13:56:15Z

build

revans2 · 2021-09-14T17:20:55Z

build

Fix canonicalization of GpuScalarSubquery

b599f5f

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 added the bug Something isn't working label Sep 13, 2021

revans2 added this to the Sep 13 - Sep 24 milestone Sep 13, 2021

revans2 self-assigned this Sep 13, 2021

jlowe previously approved these changes Sep 13, 2021

View reviewed changes

revans2 added 2 commits September 14, 2021 08:14

Merge branch 'branch-21.10' into 3_2_0_scalar_subquery_canon

8b0b49f

Updated test to work on other versions of Spark

fcf1c7a

revans2 dismissed jlowe’s stale review via fcf1c7a September 14, 2021 13:56

jlowe approved these changes Sep 14, 2021

View reviewed changes

revans2 merged commit 541b9a9 into NVIDIA:branch-21.10 Sep 14, 2021

revans2 deleted the 3_2_0_scalar_subquery_canon branch September 14, 2021 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix canonicalization of GpuScalarSubquery #3471

Fix canonicalization of GpuScalarSubquery #3471

revans2 commented Sep 13, 2021

revans2 commented Sep 13, 2021

sameerz commented Sep 14, 2021

pxLi commented Sep 14, 2021

pxLi commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021

Fix canonicalization of GpuScalarSubquery #3471

Fix canonicalization of GpuScalarSubquery #3471

Conversation

revans2 commented Sep 13, 2021

revans2 commented Sep 13, 2021

sameerz commented Sep 14, 2021

pxLi commented Sep 14, 2021

pxLi commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021

revans2 commented Sep 14, 2021