Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3705][CORE] Support mapping one custom aggregate function to more than one backend functions #3708

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

zzcclp
Copy link
Contributor

@zzcclp zzcclp commented Nov 14, 2023

What changes were proposed in this pull request?

Support mapping one custom aggregate function to more than one backend functions, like first/last function, they will be mapped to two backend function names according to the ignoreNulls parameter.

Close #3705 .

(Fixes: #3705)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…more than one backend functions

Support mapping one custom aggregate function to more than one backend functions, like first/last function, they will be mapped to two backend function names according to the ignoreNulls parameter.
Copy link

#3705

Copy link

Run Gluten Clickhouse CI

1 similar comment
@zzcclp
Copy link
Contributor Author

zzcclp commented Nov 14, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some small refactor suggestions. Thanks!


val inputTypes: Seq[DataType] = aggregateFunc.children.map(child => child.dataType)
val inputTypes: Seq[DataType] = aggregateFunc.children.map(child => child.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inputTypes looks as same as that produced by buildCustomAggregateFunction. Maybe, we can move it outside the if/else block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, the inputTypes will be obtained using the other logic, it's better to support using custom logic to get inputTypes

Some("custom_sum_double")
}
case _ =>
throw new UnsupportedOperationException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return None if we check the returned Option on calling side, as mentioned in other comment.

val (substraitAggFuncName, inputTypes) =
if (
ExpressionMappings.expressionExtensionTransformer.extensionExpressionsMapping.contains(
aggregateFunc.getClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we can remove this condition and just depend on the returned Option from buildCustomAggregateFunction. If the Option is not empty, we use the contained value. Otherwise, go to the next to find normal agg function name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buildCustomAggregateFunction only handles the custom expressions (functions and agg functions) which defines in the extensionExpressionsMapping, so it's better check first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to me. Thanks!

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@baibaichen baibaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 31e354f into apache:main Nov 15, 2023
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3708_time.csv log/native_master_11_14_2023_3e42cb06a_time.csv difference percentage
q1 32.34 33.71 1.367 104.23%
q2 24.53 23.76 -0.769 96.87%
q3 37.56 37.84 0.278 100.74%
q4 36.80 37.67 0.872 102.37%
q5 70.26 70.38 0.117 100.17%
q6 5.52 7.15 1.631 129.57%
q7 84.25 86.97 2.725 103.23%
q8 87.34 84.81 -2.528 97.11%
q9 126.17 126.75 0.572 100.45%
q10 44.85 45.72 0.867 101.93%
q11 19.38 19.34 -0.041 99.79%
q12 29.52 27.31 -2.214 92.50%
q13 46.55 46.23 -0.325 99.30%
q14 17.49 17.88 0.391 102.24%
q15 27.85 28.78 0.924 103.32%
q16 15.40 15.27 -0.137 99.11%
q17 101.57 98.37 -3.202 96.85%
q18 148.32 148.15 -0.177 99.88%
q19 14.13 13.28 -0.850 93.98%
q20 26.99 27.14 0.153 100.57%
q21 221.58 222.13 0.552 100.25%
q22 13.30 12.90 -0.395 97.03%
total 1231.71 1231.52 -0.190 99.98%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CORE] Support mapping one custom aggregate function to more than one backend functions
5 participants