-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of functions by vectorization #5672
Comments
I want to try unaryArithmetic |
cool, maybe you can open a new issue about unary arithmetic vectorization? And we can use this issue to trace the vectorization of all functions. |
Interesting! I will join it. |
Count me in! |
Maybe we can think about how to create a bench suite for the series of improvement. |
Hi, @Lloyd-Pottiger . I noticed that there are many functions under /tiflash/dbms/src/Functions/ folder Does the bit operation of vectors compare each bit of the vector bit by bit? Can you explain to me the optimization direction for the bit operation? |
Thanks for your attention. The roles of these functions is the same as mysql, so you can just refer to https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html.
The main idea is to make it run in batches. For example, we have two column a and b with type int32, like
when we run Note that, the above is just an example just for explaination, I am not sure it really helps improve the performance, a benchmark is needed. |
In your description 'compute ... rows one by one', it should be said that several a & b are calculated at the same time, which is a bit like the batch_size calculation in deep learning. But the example you show me is to cut a very long vector into serveral segments and do a & b calculations for each segment separately. It's a little different. I read the document you show me, and I feel that whether the bit operation can improve performance in this way needs further exploration. It's a little difficult for me as I am a beginner of tiflash. Are there other functions that are more suitable for beginners to complete that you can recommend to me? e.g. Good first issues. Thank you. |
one by one means compute
Yeah, it maybe difficult.
Sorry, I am not sure about "suitable". Are you familiar with SIMD? If no, #5758 maybe more suitable for beginners. And why this issue label as
|
Hi, @Lloyd-Pottiger. |
Enhancement
we can use vectorization to improve the performance of some functions:
Reference
The text was updated successfully, but these errors were encountered: