Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of functions by vectorization #5672

Open
4 tasks
Lloyd-Pottiger opened this issue Aug 22, 2022 · 10 comments
Open
4 tasks

Improve performance of functions by vectorization #5672

Lloyd-Pottiger opened this issue Aug 22, 2022 · 10 comments
Labels
component/compute good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.

Comments

@Lloyd-Pottiger
Copy link
Contributor

Lloyd-Pottiger commented Aug 22, 2022

Enhancement

we can use vectorization to improve the performance of some functions:

  • arithmetic operation
  • comparsion
  • bit operation
  • ...

Reference

  1. https://pingcap.com/zh/blog/automatic-vectorization-acceleration-for-compiler-in-tiflash
  2. https://github.com/ClickHouse/ClickHouse/blob/2b0c62ab80d1d43a2c7dc2fbe24db0124efe18d4/src/Functions/FunctionUnaryArithmetic.h#L45-L69
@Lloyd-Pottiger Lloyd-Pottiger added type/enhancement The issue or PR belongs to an enhancement. component/compute labels Aug 22, 2022
@Lloyd-Pottiger Lloyd-Pottiger changed the title Improve performance using dynamic dispatch Improve performance by using dynamic dispatch Aug 22, 2022
@Lloyd-Pottiger Lloyd-Pottiger changed the title Improve performance by using dynamic dispatch Improve performance of functions by vectorization Aug 23, 2022
@ywqzzy
Copy link
Contributor

ywqzzy commented Aug 23, 2022

I want to try unaryArithmetic

@Lloyd-Pottiger
Copy link
Contributor Author

I want to try unaryArithmetic

cool, maybe you can open a new issue about unary arithmetic vectorization? And we can use this issue to trace the vectorization of all functions.

@ywqzzy ywqzzy added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Aug 23, 2022
@Lloyd-Pottiger Lloyd-Pottiger added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 23, 2022
@xzhangxian1008
Copy link
Contributor

Interesting! I will join it.

@hongyunyan
Copy link
Contributor

Count me in!

@ywqzzy
Copy link
Contributor

ywqzzy commented Aug 25, 2022

Maybe we can think about how to create a bench suite for the series of improvement.
The bench suite itself will be an issue, and we can come up with a easy to use framework to generate different kind of workload for tiflash compute engine.

@isHuangXin
Copy link

Hi, @Lloyd-Pottiger .
I am new to TiFlash, I am very interested in this issue and want to get started with TiFlash through this task. Now, I have successfully built TiFlash in the M1 environment.

I noticed that there are many functions under /tiflash/dbms/src/Functions/ folder

Does the bit operation of vectors compare each bit of the vector bit by bit? Can you explain to me the optimization direction for the bit operation?
image

@Lloyd-Pottiger
Copy link
Contributor Author

Lloyd-Pottiger commented Sep 20, 2022

Does the bit operation of vectors compare each bit of the vector bit by bit?

Thanks for your attention. The roles of these functions is the same as mysql, so you can just refer to https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html.

Can you explain to me the optimization direction for the bit operation?

The main idea is to make it run in batches. For example, we have two column a and b with type int32, like
a | b
0x1 | 0x1
0x2 | 0x2
0x3 | 0x3
0x0 | 0xF

for better understanding, use hex format here.

when we run a & b, we will compute the result by rows one by one. But with the help of SIMD, we can compute it in batches. If we have 128 bits width register, we can finish it by just call operator bits_and & once, 0x00000001000000020000000300000000 & 0x000000010000002000000030000000F = 0x00000001000000020000000300000000 -> 0x00000001 | 0x00000002 | 0x00000003 | 0x00000000

Note that, the above is just an example just for explaination, I am not sure it really helps improve the performance, a benchmark is needed.

@isHuangXin
Copy link

a & b

In your description 'compute ... rows one by one', it should be said that several a & b are calculated at the same time, which is a bit like the batch_size calculation in deep learning. But the example you show me is to cut a very long vector into serveral segments and do a & b calculations for each segment separately. It's a little different.

I read the document you show me, and I feel that whether the bit operation can improve performance in this way needs further exploration. It's a little difficult for me as I am a beginner of tiflash.

Are there other functions that are more suitable for beginners to complete that you can recommend to me? e.g. Good first issues. Thank you.

@Lloyd-Pottiger
Copy link
Contributor Author

Lloyd-Pottiger commented Sep 20, 2022

In your description 'compute ... rows one by one', it should be said that several a & b are calculated at the same time, which is a bit like the batch_size calculation in deep learning.

one by one means compute 0x1 & 0x1 and then 0x2 | 0x2......

It's a little difficult for me as I am a beginner of tiflash.

Yeah, it maybe difficult.

Are there other functions that are more suitable for beginners to complete that you can recommend to me?

Sorry, I am not sure about "suitable". Are you familiar with SIMD? If no, #5758 maybe more suitable for beginners.

And why this issue label as good first issue:

  • first. because we do not need much knowledge about TiFlash, and we can just focus on the implement of the function.
  • good. It is a good exercise for us to use SIMD, and learn about the implementation of functions in TiFlash. Also, if you open a pr, a performance test is needed, so it is a good chance to be familiar about the usage of TiDB.

@isHuangXin
Copy link

Hi, @Lloyd-Pottiger.
Thank you for your patient answer, I am not familiar with SIMD yet, and I think this issue is temporarily a bit difficult for me. I decided to start to fix sample issues, e.g. #5092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/compute good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

5 participants