Add explain Plugin API for CPU plan #3850

tgravescs · 2021-10-18T22:05:29Z

part of #3659

I specifically did not put in user docs anywhere, we need to decide what is best place or if we want to leave this undocumented until we have tried it on a few customers.

This adds an API that will run the CPU plan through the plugin api's to see what it can convert and what it can't and just outputs a string with the explain output. Since we are looking at the CPU plan it runs other things as compared to when the rapids spark plugin would normally run in the query execution, so we strip those back out. ie cpu plan that we look at here will have CollapseCodegenStages and subqueries and ReuseExchangeAndSubquery applied, whereas normally our plugin runs before those get applied to the plan. See more details in prepareExplainOnly

It currently still requires the rapids 4 spark jar and cudf jars to be present in the Spark application , but doesn't need gpu or cuda.

The main api exposed: com.nvidia.spark.rapids.ExplainPlan.explainPotentialGpuPlan(df)

The changes in GPUOverrides are mostly moving code from the class into the object and then adding the functions needed to process the CPU plan without doing the convert.

Perhaps in the future we can try to make it more standalone tool but we need to get the GPUoverrides meta stuff where we decide if we can convert or not out by itself.

We may also want to add more API's there to perhaps get some sort of summary report rather then dumping out all the explain text.

I ran this api through the NDS queries and compared to actual GPU explain and it did pretty good when comparing the 2. This is how I found I missed the subquery expressions. So now we follow those and explain on those for cpu side.

I manually tested this on Databricks, only difference is you have to add libraries through UI so that it is available in python notebook.

Signed-off-by: Thomas Graves <tgraves@apache.org>

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Signed-off-by: Thomas Graves <tgraves@apache.org>

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2021-10-18T22:05:46Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SparkShims.scala

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2021-10-19T14:47:25Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ExplainPlan.scala

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2021-10-19T15:22:50Z

build

tgravescs · 2021-10-19T16:34:55Z

build

revans2 · 2021-10-19T21:16:00Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ExplainPlan.scala

+   * @return String containing the explained plan.
+   */
+  def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL"): String = {
+    val gpuOverrideClass = ShimLoader.loadGpuOverrides()


Do we want any kind of error handling here? If this is a public API we should probably have something talking about exceptions and saying what it can throw.

yes good point, I don't want to really log from this interface so I'll delcare what can be thrown

gerashegalov · 2021-10-20T02:07:51Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ExplainPlan.scala

+   */
+  def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL"): String = {
+    val gpuOverrideClass = ShimLoader.loadGpuOverrides()
+    val explainMethod = gpuOverrideClass


Can we try to avoid using reflection in the code we have full control over? We already make another top-level class file, so we can have a shareable API between classloaders

trait ExplainPlan { def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL") }

In GpuOverrides.scala we can define a class that call GpuOverrides.explain

class GpuExplainPlain extends ExplainPlan { override def explainPotentialGpuPlan(df: DataFrame, explain: String): String = { GpuOverrides.explainPotentialGpuPlan(df, explain) } }

in ShimLoader

def newExplainPlan(): ExplainPlan = { ShimLoader.newInstanceOf[ExplainPlan]("com.nvidia.spark.rapids.ExplainPlanImpl") }

The client API is

ShimLoader.newExplainPlan.explainPotentialGpuPlan

I need to look at more closely as I'm not following. your last line the client API, are you saying that is the public user facing api? why would we want to expose the Shimloader as a public api?
you also have a ExplainPlanImpl class, is that supposed to match the GpuExplainPlain?

maybe you mean the client API as in the object ExplainPlan.explainPotentialGpuPlan would call? Basically you are just saying create a trait that is public so can be used by both because it will be in unshimmed code?

ok I think this works, the only issue is the trait can't be called ExplainPlan if we want the pyspark to be able to call it, but we can change name to ExplainPlanType or something.

maybe you mean the client API as in the object ExplainPlan.explainPotentialGpuPlan would call? Basically you are just saying create a trait that is public so can be used by both because it will be in unshimmed code?

that's correct. Having a trait/interface/abstract class loaded via parent classloader enables us to avoid using reflection even if the implementation was loaded via a child classloader.

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2021-10-20T18:13:43Z

build

tgravescs and others added 30 commits September 27, 2021 09:58

Start making the Qualification tool programmatically callable

413872a

Signed-off-by: Thomas Graves <tgraves@apache.org>

remove unneeded numRows

2062e66

add test

c67daff

handle the listener rest of events

030a167

create RunningQualApp

1cd2a65

Signed-off-by: Thomas Graves <tgraves@apache.org>

update test and start looking at api for explain

8943e54

copyright

b4147d4

refactor writer so we can get output strings

2f9b9dc

refactor applyOverrides and add config to explain only

e14a844

fix param

d6d458e

fix param headerCSV

6ffe9a0

Add explain function

aea7a4c

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

refactor

8d28954

try shimloader

3a9f237

expose GpuExplainPlan

a55d9dd

rename ExplainPlan

98ab5bb

debug

302dd7a

working except subquery

748fee4

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

add expressions subqueries

3379c00

get subquery plans

ff0f284

fix

aea57ff

find

e116aba

handle children

e3e24d4

print

83e1541

fix plans

e314883

working subqueries

44d12ce

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Merge remote-tracking branch 'origin/branch-21.12' into qualCallable

6d83bcb

comments

a57ca73

cleanup

b9e7e8a

rework overrides

f9c2022

Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs added 8 commits October 18, 2021 16:30

remove import

b4466ee

change comment

e14fb76

revert docs

edf9253

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

change names

f85c530

change func name

0565cf0

revert docs

a09dc0e

Merge remote-tracking branch 'origin/branch-21.12' into qualExplain

a3087e7

revert docs

437abb8

tgravescs added the feature request New feature or request label Oct 18, 2021

tgravescs added this to the Oct 18 - Oct 29 milestone Oct 18, 2021

tgravescs self-assigned this Oct 18, 2021

ttnghia reviewed Oct 18, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/SparkShims.scala Show resolved Hide resolved

Add a test for setting rapids conf before calling explain

64475af

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

andygrove reviewed Oct 19, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala Outdated Show resolved Hide resolved

andygrove reviewed Oct 19, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/ExplainPlan.scala Outdated Show resolved Hide resolved

review comments

456201e

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

update docs

3640aff

andygrove previously approved these changes Oct 19, 2021

View reviewed changes

revans2 reviewed Oct 19, 2021

View reviewed changes

gerashegalov reviewed Oct 20, 2021

View reviewed changes

Define what we throw and update to not use reflection

4f86cba

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs dismissed andygrove’s stale review via 4f86cba October 20, 2021 18:12

gerashegalov approved these changes Oct 20, 2021

View reviewed changes

tgravescs merged commit 9d5ed8d into NVIDIA:branch-21.12 Oct 21, 2021

tgravescs deleted the qualExplain branch October 21, 2021 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add explain Plugin API for CPU plan #3850

Add explain Plugin API for CPU plan #3850

tgravescs commented Oct 18, 2021 •

edited

Loading

tgravescs commented Oct 18, 2021

tgravescs commented Oct 19, 2021

tgravescs commented Oct 19, 2021

tgravescs commented Oct 19, 2021

revans2 Oct 19, 2021

tgravescs Oct 20, 2021

gerashegalov Oct 20, 2021

tgravescs Oct 20, 2021

tgravescs Oct 20, 2021

tgravescs Oct 20, 2021

gerashegalov Oct 20, 2021

tgravescs commented Oct 20, 2021

Add explain Plugin API for CPU plan #3850

Add explain Plugin API for CPU plan #3850

Conversation

tgravescs commented Oct 18, 2021 • edited Loading

tgravescs commented Oct 18, 2021

tgravescs commented Oct 19, 2021

tgravescs commented Oct 19, 2021

tgravescs commented Oct 19, 2021

revans2 Oct 19, 2021

Choose a reason for hiding this comment

tgravescs Oct 20, 2021

Choose a reason for hiding this comment

gerashegalov Oct 20, 2021

Choose a reason for hiding this comment

tgravescs Oct 20, 2021

Choose a reason for hiding this comment

tgravescs Oct 20, 2021

Choose a reason for hiding this comment

tgravescs Oct 20, 2021

Choose a reason for hiding this comment

gerashegalov Oct 20, 2021

Choose a reason for hiding this comment

tgravescs commented Oct 20, 2021

tgravescs commented Oct 18, 2021 •

edited

Loading