[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

viadea · 2021-10-01T19:28:33Z

Describe the bug
A clear and concise description of what the bug is.
Crossjoin performance degraded a lot on Spark 3.2rc3 + RAPIDS 21.10 snapshot.
@nvliyuan found that on Spark2a cluster, the performance difference is as below:

Spark 3.1: 13s (event log: app-20210908040027-3651)
Spark 3.2rc3: 324s (event log: app-20211001082934-0655)

As per the query plan, there are some non-GPU plan in the Spark 3.2rc3 event log.

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Run microbenchmark cross-join query which is a 1 million row self-join:

spark.read.parquet("/data/tmp/customer1m").repartition(200).createOrReplaceTempView("costomer_df_1_million")
query = '''
select count(*) from costomer_df_1_million c1 inner join costomer_df_1_million c2 on c1.c_customer_sk>c2.c_customer_sk
'''

Expected behavior
It should have similar performance as before.

Environment details (please complete the following information)

Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
Spark configuration settings related to the issue

rapids-4-spark_2.12-21.10.0-20210929.183153-116.jar
cudf-21.10.0-20210930.121728-64.jar
Spark2a 8 nodes standalone cluster.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

jlowe · 2021-10-01T22:48:42Z

Is this really specific to Spark 3.2? Looking at the event logs, I see the 3.1 run has a "filter time" metric that is missing from the 3.2 event log. This leads me to think that the 3.1 run was run a while back before #3105, and it too would see a similar performance regression if re-run on the same snapshot.

I suspect the regression is linked to #3242 since that changed a broadcast nested loop inner join from being implemented as a cross join followed by a separate filter into a nested loop join with an AST condition evaluated during the join.

revans2 · 2021-10-01T22:54:53Z

The metrics work I did made the filter time a debug metric.

jlowe · 2021-10-01T23:02:56Z

The metrics work I did made the filter time a debug metric.

Ah, good to know, I didn't realize that. That's even more evidence that the 3.1 run is from a different snapshot build, since filter time total is appearing in the plan.

#3105 removed the filter metric from GpuBroadcastNestedLoopJoin. I'm pretty sure the 3.1 eventlog is from a fairly old build since the filter metric is appearing in that node.

jlowe · 2021-10-04T13:13:00Z

Looking at the configs from the two event logs and examining the spark.repl.local.jars setting can see the plugin version used was quite different:

For the Spark 3.1.1 run, it was using rapids-4-spark_2.12-21.08.0.jar

For the Spark 3.2 run, it was using rapids-4-spark_2.12-21.10.0-20210928.181339-115.jar

It would be good to check the performance of the Spark 3.1.1 run on the same plugin version used for the Spark 3.2 run.

abellina · 2021-10-04T14:04:36Z

I ran with the latest jar for 3.1.1. and 3.2.0 in spark2a. Confirmed there is a regression. It looks like: compute_conditional_join_output_size and conditional_join take nearly 70% of GPU time given the traces.

jlowe · 2021-10-04T15:26:57Z

Updated the headline since @abellina confirmed this is not specific to Spark 3.2.

abellina · 2021-10-04T18:07:25Z

There are some patches that @jlowe is working on to address these for 21.10, related to AST changes for the join, and some extra calls we now have to contiguous_split given a new GpuFilter node (that doesn't happen inside the join anymore).

viadea added bug Something isn't working ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Oct 1, 2021

abellina self-assigned this Oct 1, 2021

abellina added this to the Oct 4 - Oct 15 milestone Oct 1, 2021

jlowe changed the title ~~[BUG] Crossjoin performance degraded a lot on Spark 3.2rc3 + RAPIDS 21.10 snapshot~~ [BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot Oct 4, 2021

jlowe mentioned this issue Oct 4, 2021

Avoid using AST on inner joins and avoid coalesce after nested loop join filter #3746

Merged

abellina closed this as completed in #3746 Oct 5, 2021

abellina assigned jlowe and unassigned abellina Oct 6, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label Oct 10, 2021

This was referenced Oct 18, 2021

[FEA] AST enabled GpuBroadcastNestedLoopJoin left side can't be small #3832

Closed

[BUG] AST join slows down significantly with a small left table rapidsai/cudf#9461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

viadea commented Oct 1, 2021

jlowe commented Oct 1, 2021 •

edited

Loading

revans2 commented Oct 1, 2021

jlowe commented Oct 1, 2021

jlowe commented Oct 4, 2021

abellina commented Oct 4, 2021

jlowe commented Oct 4, 2021

abellina commented Oct 4, 2021

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

Comments

viadea commented Oct 1, 2021

jlowe commented Oct 1, 2021 • edited Loading

revans2 commented Oct 1, 2021

jlowe commented Oct 1, 2021

jlowe commented Oct 4, 2021

abellina commented Oct 4, 2021

jlowe commented Oct 4, 2021

abellina commented Oct 4, 2021

jlowe commented Oct 1, 2021 •

edited

Loading