Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix canonicalization of GpuScalarSubquery #3471

Merged
merged 3 commits into from
Sep 14, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Sep 13, 2021

This fixes #3400

I will file some follow on issues for a few other things I found while doing this.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added the bug Something isn't working label Sep 13, 2021
@revans2 revans2 added this to the Sep 13 - Sep 24 milestone Sep 13, 2021
@revans2 revans2 self-assigned this Sep 13, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

jlowe
jlowe previously approved these changes Sep 13, 2021
@sameerz
Copy link
Collaborator

sameerz commented Sep 14, 2021

build

1 similar comment
@pxLi
Copy link
Collaborator

pxLi commented Sep 14, 2021

build

@pxLi
Copy link
Collaborator

pxLi commented Sep 14, 2021

UT failed

[2021-09-14T01:16:57.968Z] ScalarSubquerySuite:

[2021-09-14T01:16:58.895Z] - WITH DECIMALS: Uncorrelated Scalar Subquery *** FAILED ***

[2021-09-14T01:16:58.895Z]   canonicalizationMatchesCpu=false != canonicalizationMatchesGpu=true

[2021-09-14T01:16:58.895Z]   CPU plan: *(1) Project [none#0L, Subquery scalar-subquery#42254, [id=#115307] AS #0L]

[2021-09-14T01:16:58.895Z]   :  +- Subquery scalar-subquery#42254, [id=#115307]

[2021-09-14T01:16:58.895Z]   :     +- *(2) HashAggregate(keys=[], functions=[max(more_longs#42249L)], output=[max(more_longs)#42256L])

[2021-09-14T01:16:58.895Z]   :        +- Exchange SinglePartition, true, [id=#115303]

[2021-09-14T01:16:58.895Z]   :           +- *(1) HashAggregate(keys=[], functions=[partial_max(more_longs#42249L)], output=[max#42261L])

[2021-09-14T01:16:58.895Z]   :              +- FileScan csv [more_longs#42249L] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<more_longs:bigint>

[2021-09-14T01:16:58.895Z]   +- FileScan csv [none#0L] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<longs:bigint>

[2021-09-14T01:16:58.895Z]   

[2021-09-14T01:16:58.895Z]   GPU plan: GpuColumnarToRow false

[2021-09-14T01:16:58.895Z]   +- GpuProject [none#0L, Subquery Subquery, [id=#115398] AS #0L]

[2021-09-14T01:16:58.895Z]      :  +- Subquery Subquery, [id=#115398]

[2021-09-14T01:16:58.895Z]      :     +- GpuColumnarToRow false

[2021-09-14T01:16:58.895Z]      :        +- GpuHashAggregate(keys=[], functions=[gpumax(none#0L)], output=[#0L])

[2021-09-14T01:16:58.895Z]      :           +- GpuShuffleCoalesce 2147483647

[2021-09-14T01:16:58.895Z]      :              +- GpuColumnarExchange gpusinglepartitioning$(), false, [id=#115384]

[2021-09-14T01:16:58.895Z]      :                 +- GpuHashAggregate(keys=[], functions=[partial_gpumax(none#0L)], output=[none#0L])

[2021-09-14T01:16:58.895Z]      :                    +- GpuFileGpuScan csv [none#0L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<more_longs:bigint>

[2021-09-14T01:16:58.895Z]      +- GpuFileGpuScan csv [none#0L] Batched: true, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-2635/tests/ta..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<longs:bigint> (SparkQueryCompareTestSuite.scala:367)

@revans2
Copy link
Collaborator Author

revans2 commented Sep 14, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Sep 14, 2021

To me this feels like a test issue. Spark fixed canonicalization on ScalarSubquery for 3.2, and now we have issues where we can cannonicalize it, but the CPU cannot for older versions of Spark. I think I will keep it how it is and update the test unless someone else disagrees.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 14, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Sep 14, 2021

build

@revans2 revans2 merged commit 541b9a9 into NVIDIA:branch-21.10 Sep 14, 2021
@revans2 revans2 deleted the 3_2_0_scalar_subquery_canon branch September 14, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Canonicalized GPU plans sometimes not consistent when using Spark 3.2
4 participants