Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-line parsing supports SIMD optimization #1872

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhongyuankai
Copy link

@zhongyuankai zhongyuankai commented Nov 11, 2024

Improve the performance of parsing newlines through SIMD.
The following are the performance comparison test results under 150 tasks and 180MB/s traffic:
Before optimization:
image
image

After optimization:
image
image

After optimization, the performance is improved by about 8%. If the GetNextLine method is tested separately, the performance is improved by 1 times.

@CLAassistant
Copy link

CLAassistant commented Nov 11, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


zhongyuankai seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@linrunqi08
Copy link
Collaborator

@zhongyuankai Can you share the performance comparison before and after this PR?

@linrunqi08
Copy link
Collaborator

@zhongyuankai Is it convenient to communicate on WeChat or DingTalk?

@zhongyuankai
Copy link
Author

@linrunqi08 Thank you for your reply. I have updated the comment. I am happy to communicate with you. How can I contact you on WeChat?

@linrunqi08
Copy link
Collaborator

@zhongyuankai You can add my WeChat ID: linrunqi08

const int vecSize = 32;
__m256i newlineVec = _mm256_set1_epi8('\n');

for (int32_t pos = end - vecSize; pos >= 0; pos -= vecSize) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不是32的整数倍长度怎么办?好像剩下的没检查?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants