-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE] Move all columnar rules to post-columnar transitions #4790
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
/Benchmark Velox |
This reverts commit 9523ad2.
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] { | ||
@transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]() | ||
|
||
private def replaceWithVanillaColumnarToRow(plan: SparkPlan): SparkPlan = plan match { | ||
case _ if PlanUtil.isGlutenColumnarOp(plan) => | ||
private def replaceWithVanillaColumnarToRow(p: SparkPlan): SparkPlan = p.transformUp { | ||
case plan if PlanUtil.isGlutenColumnarOp(plan) => | ||
plan.withNewChildren(plan.children.map { | ||
c => | ||
val child = replaceWithVanillaColumnarToRow(c) | ||
if (PlanUtil.isVanillaColumnarOp(child)) { | ||
BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec( | ||
ColumnarToRowExec(child)) | ||
} else { | ||
child | ||
} | ||
case child if PlanUtil.isVanillaColumnarOp(child) => | ||
BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec( | ||
ColumnarToRowExec(child)) | ||
case other => other | ||
}) | ||
case _ => | ||
plan.withNewChildren(plan.children.map(replaceWithVanillaColumnarToRow)) | ||
} | ||
|
||
private def replaceWithVanillaRowToColumnar(plan: SparkPlan): SparkPlan = plan match { | ||
case _ if PlanUtil.isVanillaColumnarOp(plan) => | ||
private def replaceWithVanillaRowToColumnar(p: SparkPlan): SparkPlan = p.transformUp { | ||
case plan if PlanUtil.isVanillaColumnarOp(plan) => | ||
plan.withNewChildren(plan.children.map { | ||
c => | ||
val child = replaceWithVanillaRowToColumnar(c) | ||
if (PlanUtil.isGlutenColumnarOp(child)) { | ||
RowToColumnarExec( | ||
BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child)) | ||
} else { | ||
child | ||
} | ||
case child if PlanUtil.isGlutenColumnarOp(child) => | ||
RowToColumnarExec( | ||
BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child)) | ||
case other => other | ||
}) | ||
case _ => | ||
plan.withNewChildren(plan.children.map(replaceWithVanillaRowToColumnar)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ulysses-you
@@ -520,39 +524,27 @@ private[extension] object ColumnarToRowLike { | |||
} | |||
// This rule will try to add RowToColumnarExecBase and ColumnarToRowExec | |||
// to support vanilla columnar operators. | |||
case class VanillaColumnarPlanOverrides(session: SparkSession) extends Rule[SparkPlan] { | |||
case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the previous rule name is more readable. It is used to be compatible with vanilla columnar related things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
Can we temporarily make the name describe what it actually does? If it now converts from columnar to columnar, I will be inclined to use this new name.
I understand it could do more than columnar transitions but we can discuss on whether to add new logics to this rule or into other independent rules at that time.
} | ||
|
||
override def postColumnarTransitions: Rule[SparkPlan] = plan => { | ||
val outputsColumnar = OutputsColumnarTester.inferOutputsColumnar(plan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use plan.supportsColumnar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark has supportsRowBased
since 3.3 . A plan node can be both supportsColumnar == true
and supportsRowBased == true
. We should know about the caller's intention having which property requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the plan support both columnar and row, then Spark should do nothing. Does plan.supportsColumnar
return different value with OutputsColumnarTester.inferOutputsColumnar
? It seems we never touch supportsRowBased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does plan.supportsColumnar return different value with OutputsColumnarTester.inferOutputsColumnar ?
It's a rare case but it does get covered by existing UTs...
Consider an UnionExec
which supports both columnar and row-based execution, then even caller requested outputsColumnar=false
through ApplyColumnarRulesAndInsertTransitions#outputsColumnar
, we don't have a way to be aware of that since the UnionExec
returns true
for its supportsColumnar
method.
So we should infer outputsColumnar
's exact value here rather than just calling supportsColumnar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move OutputsColumnarTester related code to a new file with the comment ? The ColumnarOverrides file is too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on splitting the file too. Let's do that in another patch. Thanks for the suggestion.
} | ||
|
||
// Visible for testing. | ||
def withSuggestRules(suggestRules: List[SparkSession => Rule[SparkPlan]]): Rule[SparkPlan] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about naming transformRules
? The suggest rules
is a bit hard to follow..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be good to both. I'd change to withTransformRules
if it's confusing.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. LGTM. Thanks.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
This patch moves all columnar rules to post-columnar transition API. This works like we'll inject Gluten's custom rules to Spark's preparation rule list rather than in to columnar rules.
By doing this, we can treat all of Gluten's columnar rules as an individual Spark preparation rule that works on a complete Spark plan with C2Rs and R2Cs added . So we get full control of plan conversion rather than being clamped by Spark's rule
ApplyColumnarRulesAndInsertTransitions
. This will help on continuously improving Gluten's plan optimization capability and to make it not only limited to doing columnar conversions.