Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

Closed
viadea opened this issue Oct 1, 2021 · 7 comments · Fixed by #3746
Closed

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot #3736

viadea opened this issue Oct 1, 2021 · 7 comments · Fixed by #3746
Assignees
Labels
bug Something isn't working performance A performance related task/issue

Comments

@viadea
Copy link
Collaborator

viadea commented Oct 1, 2021

Describe the bug
A clear and concise description of what the bug is.
Crossjoin performance degraded a lot on Spark 3.2rc3 + RAPIDS 21.10 snapshot.
@nvliyuan found that on Spark2a cluster, the performance difference is as below:

Spark 3.1: 13s (event log: app-20210908040027-3651)
Spark 3.2rc3: 324s (event log: app-20211001082934-0655)

As per the query plan, there are some non-GPU plan in the Spark 3.2rc3 event log.

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Run microbenchmark cross-join query which is a 1 million row self-join:

spark.read.parquet("/data/tmp/customer1m").repartition(200).createOrReplaceTempView("costomer_df_1_million")
query = '''
select count(*) from costomer_df_1_million c1 inner join costomer_df_1_million c2 on c1.c_customer_sk>c2.c_customer_sk
'''

Expected behavior
It should have similar performance as before.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

rapids-4-spark_2.12-21.10.0-20210929.183153-116.jar
cudf-21.10.0-20210930.121728-64.jar
Spark2a 8 nodes standalone cluster.

Additional context
Add any other context about the problem here.

@viadea viadea added bug Something isn't working ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Oct 1, 2021
@abellina abellina self-assigned this Oct 1, 2021
@abellina abellina added this to the Oct 4 - Oct 15 milestone Oct 1, 2021
@jlowe
Copy link
Member

jlowe commented Oct 1, 2021

Is this really specific to Spark 3.2? Looking at the event logs, I see the 3.1 run has a "filter time" metric that is missing from the 3.2 event log. This leads me to think that the 3.1 run was run a while back before #3105, and it too would see a similar performance regression if re-run on the same snapshot.

I suspect the regression is linked to #3242 since that changed a broadcast nested loop inner join from being implemented as a cross join followed by a separate filter into a nested loop join with an AST condition evaluated during the join.

@revans2
Copy link
Collaborator

revans2 commented Oct 1, 2021

The metrics work I did made the filter time a debug metric.

@jlowe
Copy link
Member

jlowe commented Oct 1, 2021

The metrics work I did made the filter time a debug metric.

Ah, good to know, I didn't realize that. That's even more evidence that the 3.1 run is from a different snapshot build, since filter time total is appearing in the plan.

#3105 removed the filter metric from GpuBroadcastNestedLoopJoin. I'm pretty sure the 3.1 eventlog is from a fairly old build since the filter metric is appearing in that node.

@jlowe
Copy link
Member

jlowe commented Oct 4, 2021

Looking at the configs from the two event logs and examining the spark.repl.local.jars setting can see the plugin version used was quite different:

For the Spark 3.1.1 run, it was using rapids-4-spark_2.12-21.08.0.jar

For the Spark 3.2 run, it was using rapids-4-spark_2.12-21.10.0-20210928.181339-115.jar

It would be good to check the performance of the Spark 3.1.1 run on the same plugin version used for the Spark 3.2 run.

@abellina
Copy link
Collaborator

abellina commented Oct 4, 2021

I ran with the latest jar for 3.1.1. and 3.2.0 in spark2a. Confirmed there is a regression. It looks like: compute_conditional_join_output_size and conditional_join take nearly 70% of GPU time given the traces.

@jlowe jlowe changed the title [BUG] Crossjoin performance degraded a lot on Spark 3.2rc3 + RAPIDS 21.10 snapshot [BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot Oct 4, 2021
@jlowe
Copy link
Member

jlowe commented Oct 4, 2021

Updated the headline since @abellina confirmed this is not specific to Spark 3.2.

@abellina
Copy link
Collaborator

abellina commented Oct 4, 2021

There are some patches that @jlowe is working on to address these for 21.10, related to AST changes for the join, and some extra calls we now have to contiguous_split given a new GpuFilter node (that doesn't happen inside the join anymore).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants