Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

jlowe · 2021-09-28T20:12:02Z

This fixes a number of issues with GpuBroadcastToCpuExec. In Spark 3.2+, there's an assert to verify that the child of an AdaptiveSparkPlanExec is a BroadcastQueryStageExec if it tries to compute the broadcast. The transition plans would place GpuBroadcastToCpuExec immediately under AdaptiveSparkPlanExec which triggered the assert.

Unfortunately creating a new BroadcastQueryStageExec instance requires shimming, since the signature of BroadcastQueryStageExec has changed across Spark versions. Newer versions require a canonicalized plan parameter, so I needed to add a shim method to create a new instance given an old one and a new child to place underneath.

Besides fixing that assert, there was also a mishandling of the broadcast data in GpuBroadcastToCpuExec. The code assumed that ColumnarBatch.rowIterator would return a unique row per iteration, but it returns the same row object instance for each iteration and mutates that object's internal state to produce the proper row when the row's data is requested. The code was producing an array of InternalRow without manifesting the data from the row, which caused each entry to be the same ColumnarBatchRow instance, pointing at the last row of data. In addition the UnsafeRow instances were being reused in a similar manner. The fix is to not force the internal rows to manifest in an array until they are being converted to an unsafe row, and when converting we need to make a copy of each unsafe row.

One last issue, which ended up being the majority of the change in this PR, is that BroadcastQueryStageExec expects a BroadcastExchangeLike child, so to fix the original assert, GpuBroadcastToCpuExec needs to be a BroadcastExchangeLike. Unfortunately BroadcastExchangeLike was forcing shims due to Databricks changing the signature of that class. Rather than shim yet another class that needs BroadcastExchangeLike, I ended up creating a shim v2 version of BroadcastExchangeLike called ShimBroadcastExchangeLike that encapsulates the differences between Apache Spark and Databricks with this class. This not only allows us to avoid shimming GpuBroadcastToCpuExec but also allows us to unshim GpuBroadcastExchangeExec which was only being shimmed due to this trait signature difference.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2021-09-28T20:12:48Z

build

abellina · 2021-09-28T21:12:31Z

Looks like a databricks301 build issue:


[2021-09-28T20:35:06.698Z] [ERROR] /home/ubuntu/spark-rapids/sql-plugin/src/main/301db/scala/com/nvidia/spark/rapids/shims/v2/Spark30XShims.scala:69: not enough arguments for method apply: (id: Int, plan: org.apache.spark.sql.execution.SparkPlan, _canonicalized: org.apache.spark.sql.execution.SparkPlan)org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec in object BroadcastQueryStageExec.

[2021-09-28T20:35:06.698Z] Unspecified value parameter _canonicalized.

[2021-09-28T20:35:06.698Z] [ERROR]       newPlan: SparkPlan): BroadcastQueryStageExec = BroadcastQueryStageExec(old.id, newPlan)

[2021-09-28T20:35:06.698Z] [ERROR]

jlowe · 2021-09-28T21:59:23Z

build

jlowe · 2021-09-29T02:11:35Z

build

tgravescs

It might be nice to add an integration test with this combo, I think we may have issues filed for both DPP and AQE testing though too, not sure about together.

jlowe · 2021-09-29T13:41:37Z

There are some double-close issues in some cases, converting to draft while I investigate.

jlowe · 2021-09-29T13:57:10Z

Found the double-buffer issue, had to do with multiple columns appearing in the same batch all referencing the same buffer.

jlowe · 2021-09-29T13:57:29Z

build

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala

jlowe · 2021-09-29T16:56:46Z

I think I've fixed the problems with HostColumnVector deserialization. Queries seem to be computing proper results, but I'm seeing the CPU and GPU execute queries with very different shapes (e.g.: CPU never runs a stage with more than 35 partitions, yet GPU will run many partitions with the full 200 partitions). Investigating.

jlowe · 2021-09-30T00:31:43Z

The occasional lack of AQE shuffle coalescing on GPU queries is unrelated to this change. Filed #3713 to track it separately.

jlowe · 2021-09-30T00:31:53Z

build

abellina · 2021-09-30T15:55:33Z

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala

-                val unsafeRows = rows.iterator.map(toUnsafe)
-                 val relation = ShimLoader.getSparkShims
+                val unsafeRows = gpuBatches.flatMap {
+                  _.rowIterator().asScala.map(r => toUnsafe(r).copy())


nit, it may be worth adding a comment above this explaining the reason for the copy. I think folks can backtrack to this github issue, and it can be done later too, not to block this PR from an expensive CI run today.

abellina

lgtm

abellina · 2021-09-30T16:25:51Z

Note I am running with this change in spark2a, and I haven't found the NPE I was seeing before.

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala

* Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] (#3691) * Fix issues with AQE and DPP enabled on Spark 3.2 Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Add canonicalized parameter for 301db shim * Fix double-close when batch contains multiple columns * Fix HostColumnVector deserialization * CDH build stopped working due to missing jars in maven repo (#3722) fixes #3718 Evidently some jars were removed from the cdh maven repo that were pulled in through spark-hive -> spark-core -> curator-recipes. We don't use that version as its explicitly called out in the cdh profiles. Just exclude spark-core when pulling in spark-hive dep. Built and unit tests pass. I did see a couple of other dependency warnings but then didn't see them again. I'll rerun with clean m2 but that shouldn't block this to fix the build. For reference the error was: `Could not resolve dependencies for project com.nvidia:rapids-4-spark-sql_2.12:jar:21.10.0-SNAPSHOT: Failed to collect dependencies at org.apache.spark:spark-hive_2.12:jar:3.1.1.3.1.7270.0-253 -> org.apache.spark:spark-core_2.12:jar:3.1.1.3.1.7270.0-253 -> org.apache.curator:curator-recipes:jar:4.3.0.7.2.7.0-SNAPSHOT: Failed to read artifact descriptor for org.apache.curator:curator-recipes:jar:4.3.0.7.2.7.0-SNAPSHOT: Could not transfer artifact org.apache.curator:curator-recipes:pom:4.3.0.7.2.7.0-SNAPSHOT from/to cloudera (https://repo.hortonworks.com/nexus/content/groups/public): PKIX path building failed:` Signed-off-by: Thomas Graves <tgraves@nvidia.com> Co-authored-by: Thomas Graves <tgraves@nvidia.com>

Fix issues with AQE and DPP enabled on Spark 3.2

d02bac3

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe added this to the Sep 27 - Oct 1 milestone Sep 28, 2021

jlowe self-assigned this Sep 28, 2021

jlowe changed the title ~~Fix issues with AQE and DPP enabled on Spark 3.2~~ Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] Sep 28, 2021

Add canonicalized parameter for 301db shim

5ff10a6

sameerz added bug Something isn't working Spark 3.2+ labels Sep 29, 2021

tgravescs previously approved these changes Sep 29, 2021

View reviewed changes

jlowe marked this pull request as draft September 29, 2021 13:41

Fix double-close when batch contains multiple columns

0f0883a

jlowe dismissed tgravescs’s stale review via 0f0883a September 29, 2021 13:56

jlowe marked this pull request as ready for review September 29, 2021 13:57

jlowe marked this pull request as draft September 29, 2021 14:10

jlowe commented Sep 29, 2021

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala Outdated Show resolved Hide resolved

Fix HostColumnVector deserialization

fe1acd5

jlowe marked this pull request as ready for review September 30, 2021 00:31

abellina reviewed Sep 30, 2021

View reviewed changes

abellina approved these changes Sep 30, 2021

View reviewed changes

jlowe mentioned this pull request Sep 30, 2021

Advertise CPU sort order and partitioning expressions to Catalyst [databricks] #3719

Merged

tgravescs approved these changes Sep 30, 2021

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala Show resolved Hide resolved

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastToCpuExec.scala Show resolved Hide resolved

tgravescs merged commit 85438b9 into NVIDIA:branch-21.10 Sep 30, 2021

jlowe mentioned this pull request Sep 30, 2021

Migrate host column deserialization code to cudf #3723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

jlowe commented Sep 28, 2021

jlowe commented Sep 28, 2021

abellina commented Sep 28, 2021

jlowe commented Sep 28, 2021

jlowe commented Sep 29, 2021

tgravescs left a comment

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 30, 2021

jlowe commented Sep 30, 2021

abellina Sep 30, 2021

abellina left a comment

abellina commented Sep 30, 2021

Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

Conversation

jlowe commented Sep 28, 2021

jlowe commented Sep 28, 2021

abellina commented Sep 28, 2021

jlowe commented Sep 28, 2021

jlowe commented Sep 29, 2021

tgravescs left a comment

Choose a reason for hiding this comment

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 29, 2021

jlowe commented Sep 30, 2021

jlowe commented Sep 30, 2021

abellina Sep 30, 2021

Choose a reason for hiding this comment

abellina left a comment

Choose a reason for hiding this comment

abellina commented Sep 30, 2021