Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention #13874

Merged
merged 1 commit into from
Dec 20, 2019

Conversation

wenleix
Copy link
Contributor

@wenleix wenleix commented Dec 17, 2019

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Optimizer performance for array_join

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 17, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Wenlei Xie (f6ebeda52f2bec4a7519a847f2eb081ad665c9d1)

@wenleix wenleix force-pushed the arrayjoin branch 3 times, most recently from 4857cfd to 22b044f Compare December 18, 2019 00:43
@wenleix wenleix changed the title Support PROVIDED_BLOCKBUILDER return convention for array_join Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention Dec 18, 2019
@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

Benchmark shows over 10% improvements.

Before

Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  152.954 ± 1.246  ns/op

After

Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  134.558 ± 2.078  ns/op

@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

See #9638 and #12166 for context.

cc @oerling , we once talked about allowing scalar function to directly write to output buffer (to avoid copy data for struct types). The framework is implemented but not function is yet using it. @kaikalur recently also observed such inefficiency when optimizing user's query, so here is an example about how to use it :) .

ArrayJoin only gets moderate benefit as the function logic is also quite intensive. This type of optimization would have more improvements for functions with light computations :)

@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

I realized the benchmark is over ARRAY(BIGINT) thus casting from BIGINT to VARCHAR can take significant time. Benchmark over ARRAY(VARCHAR) would probably show more improvements :)

Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Benchmark shows over 10% improvements.

Before
```
Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  152.954 ± 1.246  ns/op
```

After
```
Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  134.558 ± 2.078  ns/op
```
@wenleix wenleix merged commit fdb7611 into prestodb:master Dec 20, 2019
@wenleix wenleix deleted the arrayjoin branch December 20, 2019 08:06
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants