-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MSVC intrinsics for better performance #116
Conversation
Use MSVC intrinsics for better performance
Merged, thanks! |
Applied some changes in 27e0bf7 as I think the correct intrinsic to use is |
You are correct thanks. @vitaut A question: Why do you disable In addition, we may implement |
Good point, there is no reason to disable optimized As for implementing |
well, what I want to say is that we only provide count_digits(uint64_t n) when __builtin_clz(ll) is not available. Which means: if we call count_digits with an int on 32bit platform, we convert it into 64bit long long first, then do 64bit comparations and divisions, which is slow. We'd better provide count_digits(uint32_t) at the same time. |
Right, but |
Ah I didn't know you're were using intrinsics already for GCC, RapidJson is also big on SSE usage to speed up their lib. MSVC coverage is very welcome. Cool :D |
Here is a way to check which intrinsics are available for MSVC (in case you don't know already): |
According to the Intel Instruction Set page, Bit Scan Reverse (BSR) is supported on i386+. It's safe enough to use it without checking. EDIT: Found some code which may be useful for this project in RapidJson |
I agree with @CarterLi that it's better to avoid runtime checks, but thanks for the suggestion @patlecat. I was thinking of adding SSE support, but it adds very little in terms of performance while making implementation much more complicated. |
@patlecat Maybe I'm missing something, but it looks like RapidJson is not using SSE for integer to string conversion: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/itoa.h Anyway, patches are welcome =) |
Yes, @vitaut . From my implementations, SSE2 does not improve the performance much in integer-to-string conversion so I didn't put that in RapidJSON. However, RapidJSON got performance boost by using SSE2/4 for skipping whitespaces during parsing. It also uses bit scan intrinsics in Grisu implementation (float-to-string conversion) as well. RapidJSON does not use dynamic dispatching according to CPUID. It simply uses static binding. Some discussions about this can be found here. |
Thanks, @miloyip, that's what I thought. You've done a great job with RapidJSON BTW. |
Try to fix #115
Not tested yet due to lacking of MSVC compiler