[CORE] Move all columnar rules to post-columnar transitions #4790

zhztheplayer · 2024-02-27T01:53:29Z

This patch moves all columnar rules to post-columnar transition API. This works like we'll inject Gluten's custom rules to Spark's preparation rule list rather than in to columnar rules.

By doing this, we can treat all of Gluten's columnar rules as an individual Spark preparation rule that works on a complete Spark plan with C2Rs and R2Cs added . So we get full control of plan conversion rather than being clamped by Spark's rule ApplyColumnarRulesAndInsertTransitions. This will help on continuously improving Gluten's plan optimization capability and to make it not only limited to doing columnar conversions.

github-actions · 2024-02-27T01:53:47Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2024-02-27T01:54:01Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T02:25:01Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T02:25:25Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T02:29:26Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T03:34:04Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-02-27T04:55:44Z

/Benchmark Velox

This reverts commit 9523ad2.

github-actions · 2024-02-27T05:29:51Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T05:32:53Z

Run Gluten Clickhouse CI

GlutenPerfBot · 2024-02-27T05:33:13Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_4790_time.csv	log/native_master_02_26_2024_6acd1b367_time.csv	difference	percentage
q1	32.75	33.25	0.493	101.50%
q2	24.72	24.24	-0.479	98.06%
q3	38.86	38.32	-0.535	98.62%
q4	37.76	37.90	0.144	100.38%
q5	68.52	72.46	3.937	105.75%
q6	7.22	6.95	-0.268	96.29%
q7	83.11	83.72	0.611	100.73%
q8	86.19	86.53	0.346	100.40%
q9	122.79	127.60	4.812	103.92%
q10	44.70	47.09	2.387	105.34%
q11	20.37	20.47	0.103	100.51%
q12	28.33	28.59	0.269	100.95%
q13	46.07	47.07	0.998	102.17%
q14	16.91	18.34	1.430	108.46%
q15	27.27	27.73	0.458	101.68%
q16	16.31	14.65	-1.661	89.81%
q17	103.08	101.67	-1.413	98.63%
q18	148.40	147.95	-0.448	99.70%
q19	13.70	15.14	1.434	110.46%
q20	26.66	27.29	0.631	102.37%
q21	227.09	228.47	1.378	100.61%
q22	13.56	13.67	0.108	100.80%
total	1234.35	1249.09	14.733	101.19%

github-actions · 2024-02-27T06:01:12Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T06:32:23Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T06:34:14Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T06:59:28Z

Run Gluten Clickhouse CI

github-actions · 2024-02-27T07:54:55Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T04:35:04Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T08:34:42Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T08:52:17Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T09:15:50Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T15:21:43Z

Run Gluten Clickhouse CI

github-actions · 2024-02-28T15:28:42Z

Run Gluten Clickhouse CI

github-actions · 2024-02-29T01:11:50Z

Run Gluten Clickhouse CI

github-actions · 2024-02-29T01:26:18Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-02-29T04:15:16Z

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

+case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] {
  @transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]()

-  private def replaceWithVanillaColumnarToRow(plan: SparkPlan): SparkPlan = plan match {
-    case _ if PlanUtil.isGlutenColumnarOp(plan) =>
+  private def replaceWithVanillaColumnarToRow(p: SparkPlan): SparkPlan = p.transformUp {
+    case plan if PlanUtil.isGlutenColumnarOp(plan) =>
      plan.withNewChildren(plan.children.map {
-        c =>
-          val child = replaceWithVanillaColumnarToRow(c)
-          if (PlanUtil.isVanillaColumnarOp(child)) {
-            BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec(
-              ColumnarToRowExec(child))
-          } else {
-            child
-          }
+        case child if PlanUtil.isVanillaColumnarOp(child) =>
+          BackendsApiManager.getSparkPlanExecApiInstance.genRowToColumnarExec(
+            ColumnarToRowExec(child))
+        case other => other
      })
-    case _ =>
-      plan.withNewChildren(plan.children.map(replaceWithVanillaColumnarToRow))
  }

-  private def replaceWithVanillaRowToColumnar(plan: SparkPlan): SparkPlan = plan match {
-    case _ if PlanUtil.isVanillaColumnarOp(plan) =>
+  private def replaceWithVanillaRowToColumnar(p: SparkPlan): SparkPlan = p.transformUp {
+    case plan if PlanUtil.isVanillaColumnarOp(plan) =>
      plan.withNewChildren(plan.children.map {
-        c =>
-          val child = replaceWithVanillaRowToColumnar(c)
-          if (PlanUtil.isGlutenColumnarOp(child)) {
-            RowToColumnarExec(
-              BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child))
-          } else {
-            child
-          }
+        case child if PlanUtil.isGlutenColumnarOp(child) =>
+          RowToColumnarExec(
+            BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToRowExec(child))
+        case other => other
      })
-    case _ =>
-      plan.withNewChildren(plan.children.map(replaceWithVanillaRowToColumnar))
  }


cc @ulysses-you

zhztheplayer · 2024-02-29T06:08:53Z

@rui-mo @PHILO-HE

ulysses-you · 2024-02-29T07:08:30Z

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

@@ -520,39 +524,27 @@ private[extension] object ColumnarToRowLike {
 }
 // This rule will try to add RowToColumnarExecBase and ColumnarToRowExec
 // to support vanilla columnar operators.
-case class VanillaColumnarPlanOverrides(session: SparkSession) extends Rule[SparkPlan] {
+case class InsertColumnarToColumnarTransitions(session: SparkSession) extends Rule[SparkPlan] {


It seems the previous rule name is more readable. It is used to be compatible with vanilla columnar related things.

Thanks for the explanation.

Can we temporarily make the name describe what it actually does? If it now converts from columnar to columnar, I will be inclined to use this new name.

I understand it could do more than columnar transitions but we can discuss on whether to add new logics to this rule or into other independent rules at that time.

ulysses-you · 2024-02-29T07:12:57Z

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

+  }
+
+  override def postColumnarTransitions: Rule[SparkPlan] = plan => {
+    val outputsColumnar = OutputsColumnarTester.inferOutputsColumnar(plan)


why not use plan.supportsColumnar ?

Spark has supportsRowBased since 3.3 . A plan node can be both supportsColumnar == true and supportsRowBased == true. We should know about the caller's intention having which property requested.

If the plan support both columnar and row, then Spark should do nothing. Does plan.supportsColumnar return different value with OutputsColumnarTester.inferOutputsColumnar ? It seems we never touch supportsRowBased.

Does plan.supportsColumnar return different value with OutputsColumnarTester.inferOutputsColumnar ?

It's a rare case but it does get covered by existing UTs...

Consider an UnionExec which supports both columnar and row-based execution, then even caller requested outputsColumnar=false through ApplyColumnarRulesAndInsertTransitions#outputsColumnar, we don't have a way to be aware of that since the UnionExec returns true for its supportsColumnar method.

So we should infer outputsColumnar's exact value here rather than just calling supportsColumnar.

Can we move OutputsColumnarTester related code to a new file with the comment ? The ColumnarOverrides file is too big.

I was planning on splitting the file too. Let's do that in another patch. Thanks for the suggestion.

ulysses-you · 2024-02-29T07:15:04Z

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

+  }
+
+  // Visible for testing.
+  def withSuggestRules(suggestRules: List[SparkSession => Rule[SparkPlan]]): Rule[SparkPlan] =


How about naming transformRules ? The suggest rules is a bit hard to follow..

I'll be good to both. I'd change to withTransformRules if it's confusing.

github-actions · 2024-02-29T07:31:08Z

Run Gluten Clickhouse CI

github-actions · 2024-02-29T07:32:29Z

Run Gluten Clickhouse CI

JkSelf

Great work. LGTM. Thanks.

GlutenPerfBot · 2024-03-01T02:05:14Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_4790_time.csv	log/native_master_02_29_2024_22d9fe3c8_time.csv	difference	percentage
q1	32.79	33.44	0.659	102.01%
q2	24.40	27.05	2.650	110.86%
q3	38.23	35.95	-2.278	94.04%
q4	37.87	39.22	1.345	103.55%
q5	71.98	71.30	-0.682	99.05%
q6	6.88	7.43	0.550	108.00%
q7	84.42	85.83	1.412	101.67%
q8	85.77	85.20	-0.566	99.34%
q9	125.77	118.37	-7.399	94.12%
q10	43.03	43.82	0.782	101.82%
q11	20.86	20.13	-0.723	96.53%
q12	26.05	26.26	0.206	100.79%
q13	45.48	46.41	0.931	102.05%
q14	18.72	20.53	1.814	109.69%
q15	29.02	28.82	-0.206	99.29%
q16	12.92	12.90	-0.020	99.85%
q17	101.32	103.50	2.178	102.15%
q18	150.17	149.47	-0.709	99.53%
q19	15.42	12.55	-2.871	81.39%
q20	26.32	28.87	2.555	109.71%
q21	224.29	227.08	2.787	101.24%
q22	14.71	13.78	-0.933	93.66%
total	1236.44	1237.92	1.483	100.12%

zhztheplayer added 6 commits February 26, 2024 14:49

[VL] Move all columnar rules to post-columnar transitions

06c34d5

fixup

e5971b3

fixup

f02f0a6

fixup

5398879

fixup

2d3c904

fixup

e3453ce

zhztheplayer added 3 commits February 27, 2024 10:18

fixup

c0bc012

fixup

e40f8e1

fixup

213ad37

fixup

944bad3

zhztheplayer added 2 commits February 27, 2024 13:23

fixup

9523ad2

Revert "fixup"

1720bb3

This reverts commit 9523ad2.

fixup

c8a48ac

zhztheplayer added 2 commits February 27, 2024 14:25

fixup

ee633a0

fixup

b0e0b71

fixup

7c15a81

fixup

28242ec

fixup

e785b6d

fixup

a42f3cc

fixup

81206b8

zhztheplayer added 2 commits February 28, 2024 22:29

fixup

ab3c17c

fixup

70eaa18

fixup

51dbc79

fixup

44effa5

fixup

11491bf

zhztheplayer marked this pull request as ready for review February 29, 2024 01:31

zhztheplayer commented Feb 29, 2024

View reviewed changes

ulysses-you reviewed Feb 29, 2024

View reviewed changes

fixup

7f16642

Update ColumnarOverrides.scala

c53db6e

zhztheplayer changed the title ~~[VL] Move all columnar rules to post-columnar transitions~~ [CORE] Move all columnar rules to post-columnar transitions Feb 29, 2024

JkSelf approved these changes Mar 1, 2024

View reviewed changes

zhztheplayer merged commit 238f659 into apache:main Mar 1, 2024
19 checks passed

[CORE] Move all columnar rules to post-columnar transitions #4790

[CORE] Move all columnar rules to post-columnar transitions #4790

Conversation

zhztheplayer commented Feb 27, 2024 • edited Loading

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

zhztheplayer commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

GlutenPerfBot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 27, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

github-actions bot commented Feb 29, 2024

github-actions bot commented Feb 29, 2024

Choose a reason for hiding this comment

zhztheplayer commented Feb 29, 2024

Choose a reason for hiding this comment

zhztheplayer Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulysses-you Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

zhztheplayer Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 29, 2024

github-actions bot commented Feb 29, 2024

JkSelf left a comment

Choose a reason for hiding this comment

GlutenPerfBot commented Mar 1, 2024

zhztheplayer commented Feb 27, 2024 •

edited

Loading

zhztheplayer Feb 29, 2024 •

edited

Loading

ulysses-you Feb 29, 2024 •

edited

Loading

zhztheplayer Feb 29, 2024 •

edited

Loading