Improve performance of functions by vectorization #5672

Lloyd-Pottiger · 2022-08-22T11:00:16Z

Enhancement

we can use vectorization to improve the performance of some functions:

arithmetic operation
comparsion
bit operation
...

Reference

ywqzzy · 2022-08-23T02:44:54Z

I want to try unaryArithmetic

Lloyd-Pottiger · 2022-08-23T02:49:00Z

I want to try unaryArithmetic

cool, maybe you can open a new issue about unary arithmetic vectorization? And we can use this issue to trace the vectorization of all functions.

xzhangxian1008 · 2022-08-24T02:54:57Z

Interesting! I will join it.

hongyunyan · 2022-08-24T03:00:40Z

Count me in！

ywqzzy · 2022-08-25T06:54:30Z

Maybe we can think about how to create a bench suite for the series of improvement.
The bench suite itself will be an issue, and we can come up with a easy to use framework to generate different kind of workload for tiflash compute engine.

isHuangXin · 2022-09-20T02:06:51Z

Hi, @Lloyd-Pottiger .
I am new to TiFlash, I am very interested in this issue and want to get started with TiFlash through this task. Now, I have successfully built TiFlash in the M1 environment.

I noticed that there are many functions under /tiflash/dbms/src/Functions/ folder

Does the bit operation of vectors compare each bit of the vector bit by bit? Can you explain to me the optimization direction for the bit operation？

Lloyd-Pottiger · 2022-09-20T02:38:22Z

Does the bit operation of vectors compare each bit of the vector bit by bit?

Thanks for your attention. The roles of these functions is the same as mysql, so you can just refer to https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html.

Can you explain to me the optimization direction for the bit operation？

The main idea is to make it run in batches. For example, we have two column a and b with type int32, like
a | b
0x1 | 0x1
0x2 | 0x2
0x3 | 0x3
0x0 | 0xF

for better understanding, use hex format here.

when we run a & b, we will compute the result by rows one by one. But with the help of SIMD, we can compute it in batches. If we have 128 bits width register, we can finish it by just call operator bits_and & once, 0x00000001000000020000000300000000 & 0x000000010000002000000030000000F = 0x00000001000000020000000300000000 -> 0x00000001 | 0x00000002 | 0x00000003 | 0x00000000

Note that, the above is just an example just for explaination, I am not sure it really helps improve the performance, a benchmark is needed.

isHuangXin · 2022-09-20T03:00:26Z

a & b

In your description 'compute ... rows one by one', it should be said that several a & b are calculated at the same time, which is a bit like the batch_size calculation in deep learning. But the example you show me is to cut a very long vector into serveral segments and do a & b calculations for each segment separately. It's a little different.

I read the document you show me, and I feel that whether the bit operation can improve performance in this way needs further exploration. It's a little difficult for me as I am a beginner of tiflash.

Are there other functions that are more suitable for beginners to complete that you can recommend to me? e.g. Good first issues. Thank you.

Lloyd-Pottiger · 2022-09-20T03:39:50Z

In your description 'compute ... rows one by one', it should be said that several a & b are calculated at the same time, which is a bit like the batch_size calculation in deep learning.

one by one means compute 0x1 & 0x1 and then 0x2 | 0x2......

It's a little difficult for me as I am a beginner of tiflash.

Yeah, it maybe difficult.

Are there other functions that are more suitable for beginners to complete that you can recommend to me?

Sorry, I am not sure about "suitable". Are you familiar with SIMD? If no, #5758 maybe more suitable for beginners.

And why this issue label as good first issue:

first. because we do not need much knowledge about TiFlash, and we can just focus on the implement of the function.
good. It is a good exercise for us to use SIMD, and learn about the implementation of functions in TiFlash. Also, if you open a pr, a performance test is needed, so it is a good chance to be familiar about the usage of TiDB.

isHuangXin · 2022-09-20T07:37:40Z

Hi, @Lloyd-Pottiger.
Thank you for your patient answer, I am not familiar with SIMD yet, and I think this issue is temporarily a bit difficult for me. I decided to start to fix sample issues, e.g. #5092

Lloyd-Pottiger added type/enhancement The issue or PR belongs to an enhancement. component/compute labels Aug 22, 2022

Lloyd-Pottiger changed the title ~~Improve performance using dynamic dispatch~~ Improve performance by using dynamic dispatch Aug 22, 2022

Lloyd-Pottiger changed the title ~~Improve performance by using dynamic dispatch~~ Improve performance of functions by vectorization Aug 23, 2022

ywqzzy added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Aug 23, 2022

Lloyd-Pottiger mentioned this issue Aug 23, 2022

Improve performance of number comparison functions #5670

Closed

4 tasks

Lloyd-Pottiger added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 23, 2022

Lloyd-Pottiger mentioned this issue Aug 31, 2022

Improve performance of avg, sum aggregate functions if used without GROUP BY expression. #5748

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of functions by vectorization #5672

Improve performance of functions by vectorization #5672

Lloyd-Pottiger commented Aug 22, 2022 •

edited

Loading

ywqzzy commented Aug 23, 2022

Lloyd-Pottiger commented Aug 23, 2022

xzhangxian1008 commented Aug 24, 2022

hongyunyan commented Aug 24, 2022

ywqzzy commented Aug 25, 2022

isHuangXin commented Sep 20, 2022

Lloyd-Pottiger commented Sep 20, 2022 •

edited

Loading

isHuangXin commented Sep 20, 2022

Lloyd-Pottiger commented Sep 20, 2022 •

edited

Loading

isHuangXin commented Sep 20, 2022

Improve performance of functions by vectorization #5672

Improve performance of functions by vectorization #5672

Comments

Lloyd-Pottiger commented Aug 22, 2022 • edited Loading

Enhancement

ywqzzy commented Aug 23, 2022

Lloyd-Pottiger commented Aug 23, 2022

xzhangxian1008 commented Aug 24, 2022

hongyunyan commented Aug 24, 2022

ywqzzy commented Aug 25, 2022

isHuangXin commented Sep 20, 2022

Lloyd-Pottiger commented Sep 20, 2022 • edited Loading

isHuangXin commented Sep 20, 2022

Lloyd-Pottiger commented Sep 20, 2022 • edited Loading

isHuangXin commented Sep 20, 2022

Lloyd-Pottiger commented Aug 22, 2022 •

edited

Loading

Lloyd-Pottiger commented Sep 20, 2022 •

edited

Loading

Lloyd-Pottiger commented Sep 20, 2022 •

edited

Loading