-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explain Plugin API for CPU plan #3850
Conversation
Signed-off-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ExplainPlan.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
build |
* @return String containing the explained plan. | ||
*/ | ||
def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL"): String = { | ||
val gpuOverrideClass = ShimLoader.loadGpuOverrides() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want any kind of error handling here? If this is a public API we should probably have something talking about exceptions and saying what it can throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good point, I don't want to really log from this interface so I'll delcare what can be thrown
*/ | ||
def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL"): String = { | ||
val gpuOverrideClass = ShimLoader.loadGpuOverrides() | ||
val explainMethod = gpuOverrideClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to avoid using reflection in the code we have full control over? We already make another top-level class file, so we can have a shareable API between classloaders
trait ExplainPlan {
def explainPotentialGpuPlan(df: DataFrame, explain: String = "ALL")
}
In GpuOverrides.scala we can define a class that call GpuOverrides.explain
class GpuExplainPlain extends ExplainPlan {
override def explainPotentialGpuPlan(df: DataFrame, explain: String): String = {
GpuOverrides.explainPotentialGpuPlan(df, explain)
}
}
in ShimLoader
def newExplainPlan(): ExplainPlan = {
ShimLoader.newInstanceOf[ExplainPlan]("com.nvidia.spark.rapids.ExplainPlanImpl")
}
The client API is
ShimLoader.newExplainPlan.explainPotentialGpuPlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look at more closely as I'm not following. your last line the client API, are you saying that is the public user facing api? why would we want to expose the Shimloader as a public api?
you also have a ExplainPlanImpl class, is that supposed to match the GpuExplainPlain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you mean the client API as in the object ExplainPlan.explainPotentialGpuPlan would call? Basically you are just saying create a trait that is public so can be used by both because it will be in unshimmed code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I think this works, the only issue is the trait can't be called ExplainPlan if we want the pyspark to be able to call it, but we can change name to ExplainPlanType or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you mean the client API as in the object ExplainPlan.explainPotentialGpuPlan would call? Basically you are just saying create a trait that is public so can be used by both because it will be in unshimmed code?
that's correct. Having a trait/interface/abstract class loaded via parent classloader enables us to avoid using reflection even if the implementation was loaded via a child classloader.
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
part of #3659
I specifically did not put in user docs anywhere, we need to decide what is best place or if we want to leave this undocumented until we have tried it on a few customers.
This adds an API that will run the CPU plan through the plugin api's to see what it can convert and what it can't and just outputs a string with the explain output. Since we are looking at the CPU plan it runs other things as compared to when the rapids spark plugin would normally run in the query execution, so we strip those back out. ie cpu plan that we look at here will have CollapseCodegenStages and subqueries and ReuseExchangeAndSubquery applied, whereas normally our plugin runs before those get applied to the plan. See more details in
prepareExplainOnly
It currently still requires the rapids 4 spark jar and cudf jars to be present in the Spark application , but doesn't need gpu or cuda.
The main api exposed: com.nvidia.spark.rapids.ExplainPlan.explainPotentialGpuPlan(df)
The changes in GPUOverrides are mostly moving code from the class into the object and then adding the functions needed to process the CPU plan without doing the convert.
Perhaps in the future we can try to make it more standalone tool but we need to get the GPUoverrides meta stuff where we decide if we can convert or not out by itself.
We may also want to add more API's there to perhaps get some sort of summary report rather then dumping out all the explain text.
I ran this api through the NDS queries and compared to actual GPU explain and it did pretty good when comparing the 2. This is how I found I missed the subquery expressions. So now we follow those and explain on those for cpu side.
I manually tested this on Databricks, only difference is you have to add libraries through UI so that it is available in python notebook.