[GLUTEN-842][VL] convert expand op to expand exec in velox #1361

zhli1142015 · 2023-04-14T08:24:06Z

What changes were proposed in this pull request?

Here are project expressions we observed in expand operation:

agg exprs + group by exprs + gid.
agg exprs + group by exprs + gid + _gen_grouping_pos. --> The last column is for handling duplicate grouping sets.
group by exprs + gid + agg exprs. --> gid is calculated by different way from above two cases. it's assigned with the sequence number of project set.

Original ExpandExecTrandofrmer can only handle the first case. I'm adding the expand exec in velox side by the PR: https://github.com/oap-project/velox/pull/199/files. This PR is for converting spark ExpandExec to Expand OP in Velox, We don't need to do the columns' mapping in Gluten.

The original ExpandExecTrandofrmer is renamed to GroupIdExecTrandofrmer to not break ClicKHouse.

How was this patch tested?

Unit test.

github-actions · 2023-04-14T08:24:22Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

JkSelf · 2023-04-14T08:47:55Z

@baibaichen

This PR is to implement ExpandTransformer similar with vanilla spark. And then we no need to distinguish the agg cols and group cols in Gluten. It seems the Expand PR to substrait community is ready to merge except the doc update. How about directly use PR in Gluten to support Expand in this PR?

FelixYBW · 2023-04-14T09:09:15Z

@baibaichen

This PR is to implement ExpandTransformer similar with vanilla spark. And then we no need to distinguish the agg cols and group cols in Gluten. It seems the Expand PR to substrait community is ready to merge except the doc update. How about directly use PR in Gluten to support Expand in this PR?

We should follow substrait's solution. Does the PR need change to substrait?

JkSelf · 2023-04-14T09:19:55Z

@baibaichen
This PR is to implement ExpandTransformer similar with vanilla spark. And then we no need to distinguish the agg cols and group cols in Gluten. It seems the Expand PR to substrait community is ready to merge except the doc update. How about directly use PR in Gluten to support Expand in this PR?

We should follow substrait's solution. Does the PR need change to substrait?

Yes. This PR add new ExpandRel message in algebra.proto, which follow up the substrait community solution except the definition in PR.

zhli1142015 · 2023-04-17T07:12:43Z

@baibaichen
This PR is to implement ExpandTransformer similar with vanilla spark. And then we no need to distinguish the agg cols and group cols in Gluten. It seems the Expand PR to substrait community is ready to merge except the doc update. How about directly use PR in Gluten to support Expand in this PR?

We should follow substrait's solution. Does the PR need change to substrait?

Yes. This PR add new ExpandRel message in algebra.proto, which follow up the substrait community solution except the definition in PR.

Thanks, updated.

github-actions · 2023-04-17T12:24:48Z

#842

zhli1142015 · 2023-04-18T00:39:41Z

@JkSelf , @zhouyuan and @lgbo-ustc, could you please help to take a look?

JkSelf

LGTM except small comments.

gluten-core/src/main/scala/io/glutenproject/execution/ExpandExecTransformer.scala

lgbo-ustc · 2023-04-18T06:17:14Z

You mention that ExpandExecTrandofrmer has benn kept but the the ExpandRel has been changed, is it backward compatible with ClickhHouse?

zhli1142015 · 2023-04-18T06:22:39Z

It seems the ExpandRel has been changed, is it backward compatible with ClickhHouse?

new 'ExpandRel' is different and can't be compatible with CH. The original Rel is renamed to 'GroupIdRel' and changes are made in CH also.
Thanks.

lgbo-ustc · 2023-04-18T07:00:25Z

It's greate. This implementation is simple. I think it could solve some problems we have meet.

LGTM.

gluten-core/src/main/resources/substrait/proto/substrait/algebra.proto

zhli1142015 · 2023-04-18T07:27:08Z

It's greate. This implementation is simple. I think it could solve some problems we have meet.

LGTM.

Thanks for review @lgbo-ustc , will you work on CH side to consume new ExpandRel contract?

lgbo-ustc · 2023-04-18T07:36:17Z

It's greate. This implementation is simple. I think it could solve some problems we have meet.
LGTM.

Thanks for review @lgbo-ustc , will you work on CH side to consume new ExpandRel contract?

We will do it soon

zhouyuan

👍

* init change * convert expand op to expand exec in velox * add pre-project & add ut * minor change * fix ut * update algebra.proto * fix build * fix build * fix build * add ut * revert velox branch --------- Co-authored-by: zhli1142015 <zhli@pczhlich.fareast.corp.microsoft.com>

zhanglistar · 2023-04-26T03:16:53Z

@zhli1142015

agg exprs + group by exprs + gid + _gen_grouping_pos. --> The last column is for handling duplicate grouping sets.
group by exprs + gid + agg exprs. --> gid is calculated by different way from above two cases. it's assigned with the sequence number of project set.

I have a question, the two cases, could you give some sql? We don't know when spark will generate the two cases.
Thanks.

zhli1142015 · 2023-04-26T03:31:27Z

@zhli1142015
agg exprs + group by exprs + gid + _gen_grouping_pos. --> The last column is for handling duplicate grouping sets.
group by exprs + gid + agg exprs. --> gid is calculated by different way from above two cases. it's assigned with the sequence number of project set.
I have a question, the two cases, could you give some sql? We don't know when spark will generate the two cases. Thanks.

Hello @zhanglistar ,
Please check below sample code:

case class TestData3(a: Int, b: Option[Int])
val df = spark.sparkContext.parallelize(
      TestData3(1, None) ::
      TestData3(2, Some(2)) :: Nil).toDF()
import org.apache.spark.sql.functions._
df.agg(count($"a"), count($"b"), count(lit(1)), count_distinct($"a"), count_distinct($"b")).collect // case 3
df.createOrReplaceTempView("df")
spark.sql("select count(a) from df group by grouping sets((a), (a), (b))").collect // case 2

What changes were proposed in this pull request? support new ExpandRel introduced by #1361 (Fixes: #1392) How was this patch tested? unit tests

zhanglistar · 2023-04-26T09:44:25Z

@zhli1142015
agg exprs + group by exprs + gid + _gen_grouping_pos. --> The last column is for handling duplicate grouping sets.
group by exprs + gid + agg exprs. --> gid is calculated by different way from above two cases. it's assigned with the sequence number of project set.
I have a question, the two cases, could you give some sql? We don't know when spark will generate the two cases. Thanks.

Hello @zhanglistar , Please check below sample code:

case class TestData3(a: Int, b: Option[Int])
val df = spark.sparkContext.parallelize(
      TestData3(1, None) ::
      TestData3(2, Some(2)) :: Nil).toDF()
import org.apache.spark.sql.functions._
df.agg(count($"a"), count($"b"), count(lit(1)), count_distinct($"a"), count_distinct($"b")).collect // case 3
df.createOrReplaceTempView("df")
spark.sql("select count(a) from df group by grouping sets((a), (a), (b))").collect // case 2

@zhli1142015 Thanks! For the sql spark.sql("select count(a) from df group by grouping sets((a), (a), (b))").collect // case 2, just curious, why not just duplicate the two grouping sets (a) for optimization?

zhli1142015 force-pushed the expand-change-4-12 branch from 78159d0 to b1d2288 Compare April 16, 2023 07:12

zhli1142015 mentioned this pull request Apr 17, 2023

[GLUTEN-1348][CORE] Fallback if projections in Expand node have scalar function #1349

Closed

zhli1142015 force-pushed the expand-change-4-12 branch from f10f041 to 34ac44f Compare April 17, 2023 07:11

zhli1142015 force-pushed the expand-change-4-12 branch from 0a506e6 to 06c3c8f Compare April 17, 2023 08:57

zhli1142015 changed the title ~~[WIP][GLUTEN-842][VL] convert expand op to expand exec in velox~~ [GLUTEN-842][VL] convert expand op to expand exec in velox Apr 17, 2023

zhli1142015 marked this pull request as ready for review April 17, 2023 12:24

zhli1142015 requested a review from JkSelf April 17, 2023 12:24

zhli1142015 requested a review from zzcclp April 17, 2023 12:24

JkSelf previously approved these changes Apr 18, 2023

View reviewed changes

gluten-core/src/main/scala/io/glutenproject/execution/ExpandExecTransformer.scala Show resolved Hide resolved

zhli1142015 added 9 commits April 18, 2023 12:59

init change

50ae03e

convert expand op to expand exec in velox

5a34a88

add pre-project & add ut

89bcd86

minor change

9ee85dd

fix ut

a2ebd0b

update algebra.proto

e7c68f8

fix build

f667fbc

fix build

2ee30e5

fix build

edf6dcb

FelixYBW previously approved these changes Apr 18, 2023

View reviewed changes

add ut

91c8666

zhli1142015 dismissed stale reviews from FelixYBW and JkSelf via 91c8666 April 18, 2023 06:03

zhli1142015 force-pushed the expand-change-4-12 branch from a7d71dd to 91c8666 Compare April 18, 2023 06:03

lgbo-ustc reviewed Apr 18, 2023

View reviewed changes

gluten-core/src/main/resources/substrait/proto/substrait/algebra.proto Show resolved Hide resolved

This comment was marked as duplicate.

Sign in to view

lgbo-ustc mentioned this pull request Apr 18, 2023

[CH] Support the new ExpandRel in #1361 #1392

Closed

revert velox branch

d014260

zhouyuan approved these changes Apr 19, 2023

View reviewed changes

zhli1142015 merged commit c156453 into apache:main Apr 19, 2023

exmy mentioned this pull request Apr 20, 2023

[GLUTEN-1392][CH] Support new ExpandRel #1432

Merged

zhli1142015 deleted the expand-change-4-12 branch April 26, 2023 04:44

liuneng1994 pushed a commit that referenced this pull request Apr 26, 2023

[GLUTEN-1392][CH] Support new ExpandRel (#1432)

bd43690

What changes were proposed in this pull request? support new ExpandRel introduced by #1361 (Fixes: #1392) How was this patch tested? unit tests

JkSelf mentioned this pull request Jul 18, 2023

[GLUTEN-2362][CORE] Record Substrait modifications and remove arena option #2363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-842][VL] convert expand op to expand exec in velox #1361

[GLUTEN-842][VL] convert expand op to expand exec in velox #1361

zhli1142015 commented Apr 14, 2023 •

edited

Loading

github-actions bot commented Apr 14, 2023

JkSelf commented Apr 14, 2023

FelixYBW commented Apr 14, 2023

JkSelf commented Apr 14, 2023

zhli1142015 commented Apr 17, 2023

github-actions bot commented Apr 17, 2023

zhli1142015 commented Apr 18, 2023

JkSelf left a comment

lgbo-ustc commented Apr 18, 2023 •

edited

Loading

zhli1142015 commented Apr 18, 2023

lgbo-ustc commented Apr 18, 2023

This comment was marked as duplicate.

zhli1142015 commented Apr 18, 2023

lgbo-ustc commented Apr 18, 2023

zhouyuan left a comment

zhanglistar commented Apr 26, 2023

zhli1142015 commented Apr 26, 2023

zhanglistar commented Apr 26, 2023 •

edited

Loading

[GLUTEN-842][VL] convert expand op to expand exec in velox #1361

[GLUTEN-842][VL] convert expand op to expand exec in velox #1361

Conversation

zhli1142015 commented Apr 14, 2023 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Apr 14, 2023

JkSelf commented Apr 14, 2023

FelixYBW commented Apr 14, 2023

JkSelf commented Apr 14, 2023

zhli1142015 commented Apr 17, 2023

github-actions bot commented Apr 17, 2023

zhli1142015 commented Apr 18, 2023

JkSelf left a comment

Choose a reason for hiding this comment

lgbo-ustc commented Apr 18, 2023 • edited Loading

zhli1142015 commented Apr 18, 2023

lgbo-ustc commented Apr 18, 2023

This comment was marked as duplicate.

zhli1142015 commented Apr 18, 2023

lgbo-ustc commented Apr 18, 2023

zhouyuan left a comment

Choose a reason for hiding this comment

zhanglistar commented Apr 26, 2023

zhli1142015 commented Apr 26, 2023

zhanglistar commented Apr 26, 2023 • edited Loading

zhli1142015 commented Apr 14, 2023 •

edited

Loading

lgbo-ustc commented Apr 18, 2023 •

edited

Loading

zhanglistar commented Apr 26, 2023 •

edited

Loading