-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser improvements #5
Conversation
The alpha blending could probably benefit from SIMD usage, too, but I'm kinda waiting for benches that include alpha for this |
reduces the time from 13.5ms to 11ms (~-20%), and breaks the benchmark
Really awesome, I like! I think the UB is fine, when the user provides garbage values it's ok to color the pixel with some random value I guess. I want to write some notes for README (e.g. that we need to use nightly rust) before merging this. |
As I mentioned in one of the commits, the speedup on my machine was, averaging across many runs, from 13.5ms to 11ms. My bench results have been pretty stable at 11.05±0.1ms |
I also think that UB is fine, that's why I used in the first place, it's just something that is different now due to the way how the conversion happens, as it is no longer a lookup, but rather a calculation that is only defined for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this PR as-is, but have made some changes to the Rust toolchain and README here.
I would leave it up to you if you want pull in the 3 commits from https://github.com/sbernauer/breakwater/tree/parser-improvements, or if I should push them afterwards
Regarding the assembler, this is the function as IDA disassebles it: u32 __fastcall breakwater::parser::simd_unhex::h52042561b542debc(__u8_ value, __m128 _XMM0)
{
u32 result; // eax
core::option::Option<core::fmt::Arguments> args; // [rsp+0h] [rbp-38h] BYREF
*(_QWORD *)args.gap0 = value.length;
if ( value.length != 8 )
{
*(_QWORD *)&args.gap0[8] = 0LL;
core::panicking::assert_failed::h676f2fb9f137bf56(Eq, (usize *)&args, (usize *)"\b", args);
}
__asm
{
vpmovzxbd ymm0, qword ptr [rdi]
vpbroadcastd ymm2, cs:dword_2EE7B4
vpsrld ymm1, ymm0, 6
vpmaddwd ymm1, ymm1, cs:ymmword_2EE7C0
vpand ymm0, ymm0, ymm2
vpaddd ymm0, ymm1, ymm0
vpsllvd ymm0, ymm0, cs:ymmword_2EE7E0
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 0EEh
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 55h ; 'U'
vpor xmm0, xmm0, xmm1
vmovd eax, xmm0
vzeroupper
}
return result;
} |
@sbernauer I pulled your commits, and it is ready to be merged. |
One note about parsing coordinates using simd: as they are variable length, it will be hard if not impossible to do. That was one of the reasons why I picked colors as my first target, as their length is known in the code path. |
That were exactly my thoughts! Maybe we can use some bitmasks to determine the coordinate length and afterwards use specialized SIMD-instructions. Event cooler would be to parse both coordinates simultaneous, but we are getting ahead of ourselves ^^ |
Thanks for putting this up! |
Have to come back this before merging to get the ci checks green:
Will try to get it done today |
I have played around with SIMD digit parsing now, in a separate branch (https://github.com/fabi321/breakwater/tree/simd-digit-parsing). However, it is significantly slower as of now. it takes about 20.5ms. I also tried to confuse the branch predictor a bit, shuffling the commands, but that puts the current solution at 18.75ms, still faster than my simd one. So either, I find a way to fix the performance issues, or it will have to stay like this. |
Sounds really interesting, I think I will give it a try as well. I now also installed Linux on (~10 year old) Desktop to get more reliable benchmarks. |
Btw, had a -8.2877% improvement (27.359ms vs 25.092ms) on my Desktop 👍 (which sadly only support avx, no avx2 or above) |
Multiple parser improvements,both on the readability, as well as performance side.
The new SIMD hex parser has one major drawback: it is undefined behavior for invalid characters, instead of 0.