perf(expr): new interface for expression directly returning scalar #9049

BugenZhao · 2023-04-07T10:54:52Z

Signed-off-by: Bugen Zhao i@bugenzhao.comI hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Do not repeat the literal value 1024 times when evaluating an expression that usually takes a literal as one input like EXTRACT(HOUR FROM col). (#9052)

extract(constant)       time:   [10.678 µs 10.702 µs 10.727 µs]
                        change: [-34.576% -34.333% -34.043%] (p = 0.00 < 0.05)
                        Performance has improved.

The simplified implementation for demo purposes is here. Briefly, we introduce a new interface of Expr::eval_new (temporary name) which returns either Array, or directly Datum if the return value is supposed to be a constant array. When using the return value as argument, the Datum variant can be directly iterated with repeat.take instead of allocating a real array.

The eval_new is bidirectionally compatible with eval by always falling back to the Array variant, so most of the hand-written expressions are not touched in this PR. Instead, we rewrite the implementation with eval_new for all templated expressions, and apply the optimization to expr_literal.

No performance regress is observed for using Either as the iterator.
template_fast is not touched as we're not sure whether this breaks SIMD.

Checklist For Contributors

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

My PR DOES NOT contain user-facing changes.

Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

Installation and deployment
Connector (sources & sinks)
SQL commands, functions, and operators
RisingWave cluster configuration changes
Other (please specify in the release note below)

Release note

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao · 2023-04-07T12:12:22Z

src/expr/benches/expr.rs

@@ -205,6 +205,15 @@ fn bench_expr(c: &mut Criterion) {
            .to_async(FuturesExecutor)
            .iter(|| constant.eval(&input))
    });
+    c.bench_function("extract(constant)", |bencher| {


Not sure if is there any general way to construct this test case. :(

I guess not... Let's keep it manual construction :(

codecov · 2023-04-07T12:31:27Z

Codecov Report

Merging #9049 (5a25ac9) into main (a63807d) will decrease coverage by 0.01%.
The diff coverage is 94.59%.

@@            Coverage Diff             @@
##             main    #9049      +/-   ##
==========================================
- Coverage   70.78%   70.77%   -0.01%     
==========================================
  Files        1195     1196       +1     
  Lines      197687   197696       +9     
==========================================
+ Hits       139926   139927       +1     
- Misses      57761    57769       +8

Flag	Coverage Δ
rust	`70.77% <94.59%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/expr/src/expr/value.rs	`88.88% <88.88%> (ø)`
src/expr/src/expr/expr_literal.rs	`96.83% <100.00%> (-0.38%)`	⬇️
src/expr/src/expr/mod.rs	`60.78% <100.00%> (+12.06%)`	⬆️
src/expr/src/expr/template.rs	`72.04% <100.00%> (+1.07%)`	⬆️

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

xxchan

Looks both good and hacky to me at the same time 🥵

xxchan · 2023-04-07T13:17:36Z

🤔 This seems kind of "lazy evaluation" (or "late materialization"?): executor still use eval to get fully-expanded array, while inside expr, scalars can be used directly.

xxchan · 2023-04-07T13:18:25Z

src/expr/src/expr/mod.rs

-    async fn eval(&self, input: &DataChunk) -> Result<ArrayRef>;
+    /// The default implementation calls `eval` and puts the result into the `Array` variant.
+    async fn eval_new(&self, input: &DataChunk) -> Result<ValueImpl> {
+        self.eval(input).map_ok(ValueImpl::Array).await


So it stackoverflows if both not implemented? 😄

Yes. 🥵 Not sure if there's a way to avoid this.

xxchan · 2023-04-07T13:26:17Z

src/expr/src/expr/template.rs

+                    // Otherwise, fallback to array computation.
+                    // TODO: match all possible combinations to further get rid of the overhead of `Either` iterators.


Wait, so EXTRACT(HOUR FROM col) still fallbacks to array computation? Then where's the improvement from? 🥵

For execution, yes. The difference is that we're not allocating a new array for HOUR now. Instead, iter here directly calls repeat().take().

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao · 2023-04-10T10:12:20Z

Looks both good and hacky to me at the same time 🥵

Not that hacky. 🥵 Actually I borrow the ideas partially from https://github.com/datafuselabs/databend/blob/a3ac38f838eb37e8ea5600d526e7d1b2f3c0bb50/src/query/expression/src/evaluator.rs#L138.

wangrunji0408

Generally LGTM!

wangrunji0408 · 2023-04-10T08:23:35Z

src/expr/benches/expr.rs

@@ -205,6 +205,15 @@ fn bench_expr(c: &mut Criterion) {
            .to_async(FuturesExecutor)
            .iter(|| constant.eval(&input))
    });
+    c.bench_function("extract(constant)", |bencher| {


I guess not... Let's keep it manual construction :(

src/expr/src/expr/template.rs

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao added 5 commits April 7, 2023 18:49

add benchmark

76e8a24

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

add value and eval_new

ebd4670

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

generate eval_new

2f6964a

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

clean up

cdeb820

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

use eval_new for literal

c3900ab

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

github-actions bot added the type/perf label Apr 7, 2023

BugenZhao changed the title ~~perf(expr): directly return scalar for literal expression~~ perf(expr): new interface for literal expression directly returning scalar Apr 7, 2023

refine docs

649fe6e

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao changed the title ~~perf(expr): new interface for literal expression directly returning scalar~~ perf(expr): new interface for expression directly returning scalar Apr 7, 2023

BugenZhao marked this pull request as ready for review April 7, 2023 11:52

trigger CI

1a26a7d

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao added the component/common Common components, such as array, data chunk, expression. label Apr 7, 2023

BugenZhao requested review from fuyufjh, wangrunji0408 and xxchan April 7, 2023 12:11

BugenZhao commented Apr 7, 2023

View reviewed changes

xxchan reviewed Apr 7, 2023

View reviewed changes

simplify trait bound

8a85fab

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao mentioned this pull request Apr 10, 2023

feat(expr): support nullary #[function] #9084

Merged

7 tasks

wangrunji0408 approved these changes Apr 10, 2023

View reviewed changes

y-wei self-requested a review April 10, 2023 17:17

BugenZhao added 2 commits April 11, 2023 13:08

Merge remote-tracking branch 'origin/main' into bz/scalar-expr

446b95a

rename to eval_v2 and remove todo

5a25ac9

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao enabled auto-merge April 11, 2023 05:14

BugenZhao added this pull request to the merge queue Apr 11, 2023

Merged via the queue into main with commit 5b5b2e6 Apr 11, 2023

BugenZhao deleted the bz/scalar-expr branch April 11, 2023 06:02

BugenZhao mentioned this pull request Apr 11, 2023

feat: introduce PROCTIME() #9088

Merged

5 tasks

BugenZhao mentioned this pull request May 11, 2023

chore(type): remove Column and column!.. macros #9733

Merged

7 tasks

xxchan mentioned this pull request Jun 20, 2023

expr: avoid repeating the same scalar into an array #9052

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(expr): new interface for expression directly returning scalar #9049

perf(expr): new interface for expression directly returning scalar #9049

BugenZhao commented Apr 7, 2023 •

edited

Loading

BugenZhao Apr 7, 2023

wangrunji0408 Apr 10, 2023

codecov bot commented Apr 7, 2023 •

edited

Loading

xxchan left a comment

xxchan commented Apr 7, 2023

xxchan Apr 7, 2023

BugenZhao Apr 8, 2023

xxchan Apr 7, 2023

BugenZhao Apr 8, 2023

BugenZhao commented Apr 10, 2023

wangrunji0408 left a comment

wangrunji0408 Apr 10, 2023

		// Otherwise, fallback to array computation.
		// TODO: match all possible combinations to further get rid of the overhead of `Either` iterators.

perf(expr): new interface for expression directly returning scalar #9049

perf(expr): new interface for expression directly returning scalar #9049

Conversation

BugenZhao commented Apr 7, 2023 • edited Loading

What's changed and what's your intention?

Checklist For Contributors

Checklist For Reviewers

Documentation

Types of user-facing changes

Release note

BugenZhao Apr 7, 2023

Choose a reason for hiding this comment

wangrunji0408 Apr 10, 2023

Choose a reason for hiding this comment

codecov bot commented Apr 7, 2023 • edited Loading

Codecov Report

xxchan left a comment

Choose a reason for hiding this comment

xxchan commented Apr 7, 2023

xxchan Apr 7, 2023

Choose a reason for hiding this comment

BugenZhao Apr 8, 2023

Choose a reason for hiding this comment

xxchan Apr 7, 2023

Choose a reason for hiding this comment

BugenZhao Apr 8, 2023

Choose a reason for hiding this comment

BugenZhao commented Apr 10, 2023

wangrunji0408 left a comment

Choose a reason for hiding this comment

wangrunji0408 Apr 10, 2023

Choose a reason for hiding this comment

BugenZhao commented Apr 7, 2023 •

edited

Loading

codecov bot commented Apr 7, 2023 •

edited

Loading