Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Move all columnar rules to post-columnar transitions #4790

Merged
merged 29 commits into from
Mar 1, 2024

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Feb 27, 2024

This patch moves all columnar rules to post-columnar transition API. This works like we'll inject Gluten's custom rules to Spark's preparation rule list rather than in to columnar rules.

By doing this, we can treat all of Gluten's columnar rules as an individual Spark preparation rule that works on a complete Spark plan with C2Rs and R2Cs added . So we get full control of plan conversion rather than being clamped by Spark's rule ApplyColumnarRulesAndInsertTransitions. This will help on continuously improving Gluten's plan optimization capability and to make it not only limited to doing columnar conversions.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member Author

/Benchmark Velox

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4790_time.csv log/native_master_02_26_2024_6acd1b367_time.csv difference percentage
q1 32.75 33.25 0.493 101.50%
q2 24.72 24.24 -0.479 98.06%
q3 38.86 38.32 -0.535 98.62%
q4 37.76 37.90 0.144 100.38%
q5 68.52 72.46 3.937 105.75%
q6 7.22 6.95 -0.268 96.29%
q7 83.11 83.72 0.611 100.73%
q8 86.19 86.53 0.346 100.40%
q9 122.79 127.60 4.812 103.92%
q10 44.70 47.09 2.387 105.34%
q11 20.37 20.47 0.103 100.51%
q12 28.33 28.59 0.269 100.95%
q13 46.07 47.07 0.998 102.17%
q14 16.91 18.34 1.430 108.46%
q15 27.27 27.73 0.458 101.68%
q16 16.31 14.65 -1.661 89.81%
q17 103.08 101.67 -1.413 98.63%
q18 148.40 147.95 -0.448 99.70%
q19 13.70 15.14 1.434 110.46%
q20 26.66 27.29 0.631 102.37%
q21 227.09 228.47 1.378 100.61%
q22 13.56 13.67 0.108 100.80%
total 1234.35 1249.09 14.733 101.19%

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer marked this pull request as ready for review February 29, 2024 01:31
Comment on lines +527 to 548
case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] {
@transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]()

private def replaceWithVanillaColumnarToRow(plan: SparkPlan): SparkPlan = plan match {
case _ if PlanUtil.isGlutenColumnarOp(plan) =>
private def replaceWithVanillaColumnarToRow(p: SparkPlan): SparkPlan = p.transformUp {
case plan if PlanUtil.isGlutenColumnarOp(plan) =>
plan.withNewChildren(plan.children.map {
c =>
val child = replaceWithVanillaColumnarToRow(c)
if (PlanUtil.isVanillaColumnarOp(child)) {
BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec(
ColumnarToRowExec(child))
} else {
child
}
case child if PlanUtil.isVanillaColumnarOp(child) =>
BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec(
ColumnarToRowExec(child))
case other => other
})
case _ =>
plan.withNewChildren(plan.children.map(replaceWithVanillaColumnarToRow))
}

private def replaceWithVanillaRowToColumnar(plan: SparkPlan): SparkPlan = plan match {
case _ if PlanUtil.isVanillaColumnarOp(plan) =>
private def replaceWithVanillaRowToColumnar(p: SparkPlan): SparkPlan = p.transformUp {
case plan if PlanUtil.isVanillaColumnarOp(plan) =>
plan.withNewChildren(plan.children.map {
c =>
val child = replaceWithVanillaRowToColumnar(c)
if (PlanUtil.isGlutenColumnarOp(child)) {
RowToColumnarExec(
BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child))
} else {
child
}
case child if PlanUtil.isGlutenColumnarOp(child) =>
RowToColumnarExec(
BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child))
case other => other
})
case _ =>
plan.withNewChildren(plan.children.map(replaceWithVanillaRowToColumnar))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer
Copy link
Member Author

@rui-mo @PHILO-HE

@@ -520,39 +524,27 @@ private[extension] object ColumnarToRowLike {
}
// This rule will try to add RowToColumnarExecBase and ColumnarToRowExec
// to support vanilla columnar operators.
case class VanillaColumnarPlanOverrides(session: SparkSession) extends Rule[SparkPlan] {
case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the previous rule name is more readable. It is used to be compatible with vanilla columnar related things.

Copy link
Member Author

@zhztheplayer zhztheplayer Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Can we temporarily make the name describe what it actually does? If it now converts from columnar to columnar, I will be inclined to use this new name.

I understand it could do more than columnar transitions but we can discuss on whether to add new logics to this rule or into other independent rules at that time.

}

override def postColumnarTransitions: Rule[SparkPlan] = plan => {
val outputsColumnar = OutputsColumnarTester.inferOutputsColumnar(plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use plan.supportsColumnar ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark has supportsRowBased since 3.3 . A plan node can be both supportsColumnar == true and supportsRowBased == true. We should know about the caller's intention having which property requested.

Copy link
Contributor

@ulysses-you ulysses-you Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan support both columnar and row, then Spark should do nothing. Does plan.supportsColumnar return different value with OutputsColumnarTester.inferOutputsColumnar ? It seems we never touch supportsRowBased.

Copy link
Member Author

@zhztheplayer zhztheplayer Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does plan.supportsColumnar return different value with OutputsColumnarTester.inferOutputsColumnar ?

It's a rare case but it does get covered by existing UTs...

Consider an UnionExec which supports both columnar and row-based execution, then even caller requested outputsColumnar=false through ApplyColumnarRulesAndInsertTransitions#outputsColumnar, we don't have a way to be aware of that since the UnionExec returns true for its supportsColumnar method.

So we should infer outputsColumnar's exact value here rather than just calling supportsColumnar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move OutputsColumnarTester related code to a new file with the comment ? The ColumnarOverrides file is too big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on splitting the file too. Let's do that in another patch. Thanks for the suggestion.

}

// Visible for testing.
def withSuggestRules(suggestRules: List[SparkSession => Rule[SparkPlan]]): Rule[SparkPlan] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming transformRules ? The suggest rules is a bit hard to follow..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be good to both. I'd change to withTransformRules if it's confusing.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [VL] Move all columnar rules to post-columnar transitions [CORE] Move all columnar rules to post-columnar transitions Feb 29, 2024
Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. LGTM. Thanks.

@zhztheplayer zhztheplayer merged commit 238f659 into apache:main Mar 1, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4790_time.csv log/native_master_02_29_2024_22d9fe3c8_time.csv difference percentage
q1 32.79 33.44 0.659 102.01%
q2 24.40 27.05 2.650 110.86%
q3 38.23 35.95 -2.278 94.04%
q4 37.87 39.22 1.345 103.55%
q5 71.98 71.30 -0.682 99.05%
q6 6.88 7.43 0.550 108.00%
q7 84.42 85.83 1.412 101.67%
q8 85.77 85.20 -0.566 99.34%
q9 125.77 118.37 -7.399 94.12%
q10 43.03 43.82 0.782 101.82%
q11 20.86 20.13 -0.723 96.53%
q12 26.05 26.26 0.206 100.79%
q13 45.48 46.41 0.931 102.05%
q14 18.72 20.53 1.814 109.69%
q15 29.02 28.82 -0.206 99.29%
q16 12.92 12.90 -0.020 99.85%
q17 101.32 103.50 2.178 102.15%
q18 150.17 149.47 -0.709 99.53%
q19 15.42 12.55 -2.871 81.39%
q20 26.32 28.87 2.555 109.71%
q21 224.29 227.08 2.787 101.24%
q22 14.71 13.78 -0.933 93.66%
total 1236.44 1237.92 1.483 100.12%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants