Use MSVC intrinsics for better performance #116

CarterLi · 2015-02-20T06:35:03Z

Try to fix #115
Not tested yet due to lacking of MSVC compiler

Use MSVC intrinsics for better performance

vitaut · 2015-02-20T16:22:26Z

Merged, thanks!

vitaut · 2015-02-20T16:52:40Z

Applied some changes in 27e0bf7 as I think the correct intrinsic to use is _BitScanReverse. Also the intrinsic was not called in release version due to assert.

CarterLi · 2015-02-21T01:41:39Z

You are correct thanks. @vitaut

A question: Why do you disable count_digits(uint32_t n) conditionally when FMT_BUILTIN_CLZLL is not avaliable? I think count_digits(uint32_t n) can coexist with the fallback version of count_digits perfectly since count_digits(uint32_t n) is only an optimization in my opinion.
Note: _BitScanReverse64 is only avaliable on WIN64. Provide count_digits(uint32_t n) should provide much better performance when compiling 32bit version.

In addition, we may implement __builtin_ctzll using __builtin_ctz when __builtin_ctzll is not avaliable (for example), since 64bit operation is much more slower than 32bit operation on x86 and int (ie int32_t) is much more widely used than long long (ie int64_t).

vitaut · 2015-02-21T14:31:01Z

Why do you disable count_digits(uint32_t n) conditionally when FMT_BUILTIN_CLZLL is not avaliable?

Good point, there is no reason to disable optimized count_digits(uint32_t n) when FMT_BUILTIN_CLZLL is not avaliable. I've fixed this in cbc46d9.

As for implementing __builtin_clzll in terms of __builtin_clz on 32-bit platforms, this is reasonable although not critical, because, as you mentioned yourself, int is much more widely used than long long and int is already optimized on 32-bit platforms. Besides, 32-bit systems are becoming increasingly rare. But patches are welcome =).

CarterLi · 2015-02-21T15:56:44Z

well, what I want to say is that we only provide count_digits(uint64_t n) when __builtin_clz(ll) is not available. Which means: if we call count_digits with an int on 32bit platform, we convert it into 64bit long long first, then do 64bit comparations and divisions, which is slow. We'd better provide count_digits(uint32_t) at the same time.
sorry, I posted my comment without finishing it for some reason

vitaut · 2015-02-21T16:38:24Z

Right, but __builtin_clz or equivalent should be available on most platforms. Do you know any popular platform where __builtin_clz is not provided? Not that I'm completely opposed to adding a fallback count_digits(uint32_t) variant (perhaps, by making count_digits a template), just want to make sure that we don't add anything that is almost never used.

patlecat · 2015-02-23T09:26:21Z

Ah I didn't know you're were using intrinsics already for GCC, RapidJson is also big on SSE usage to speed up their lib. MSVC coverage is very welcome. Cool :D

patlecat · 2015-02-23T09:30:45Z

Here is a way to check which intrinsics are available for MSVC (in case you don't know already):
https://msdn.microsoft.com/en-us/library/hskdteyh.aspx

CarterLi · 2015-02-23T10:51:18Z

According to the Intel Instruction Set page, Bit Scan Reverse (BSR) is supported on i386+. It's safe enough to use it without checking.
Runtime CPUID check should not be used in this case in my opinion since CPUID instruction itself takes time

EDIT: Found some code which may be useful for this project in RapidJson

vitaut · 2015-02-23T14:57:25Z

I agree with @CarterLi that it's better to avoid runtime checks, but thanks for the suggestion @patlecat. I was thinking of adding SSE support, but it adds very little in terms of performance while making implementation much more complicated.

patlecat · 2015-02-24T21:08:05Z

@vitaut Then why did @miloyip rely on SSE for RapidJson?

You have a library here, so leave it to the developer on what nifty features to enable for his project, then you don't need CPUID, or offer a CPUID tool seperately to make the decision easier for your users.

vitaut · 2015-02-24T21:22:10Z

@patlecat Maybe I'm missing something, but it looks like RapidJson is not using SSE for integer to string conversion: https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/internal/itoa.h

Anyway, patches are welcome =)

miloyip · 2015-02-25T02:39:18Z

Yes, @vitaut . From my implementations, SSE2 does not improve the performance much in integer-to-string conversion so I didn't put that in RapidJSON. However, RapidJSON got performance boost by using SSE2/4 for skipping whitespaces during parsing. It also uses bit scan intrinsics in Grisu implementation (float-to-string conversion) as well.

RapidJSON does not use dynamic dispatching according to CPUID. It simply uses static binding. Some discussions about this can be found here.

vitaut · 2015-02-25T04:14:26Z

Thanks, @miloyip, that's what I thought. You've done a great job with RapidJSON BTW.

Use MSVC intrinsics for better performance

00e3ae5

vitaut added a commit that referenced this pull request Feb 20, 2015

Merge pull request #116 from CarterLi/master

939193b

Use MSVC intrinsics for better performance

vitaut merged commit 939193b into fmtlib:master Feb 20, 2015

vitaut mentioned this pull request Apr 9, 2015

Faster float format #147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MSVC intrinsics for better performance #116

Use MSVC intrinsics for better performance #116

CarterLi commented Feb 20, 2015

vitaut commented Feb 20, 2015

vitaut commented Feb 20, 2015

CarterLi commented Feb 21, 2015

vitaut commented Feb 21, 2015

CarterLi commented Feb 21, 2015

vitaut commented Feb 21, 2015

patlecat commented Feb 23, 2015

patlecat commented Feb 23, 2015

CarterLi commented Feb 23, 2015

vitaut commented Feb 23, 2015

patlecat commented Feb 24, 2015

vitaut commented Feb 24, 2015

miloyip commented Feb 25, 2015

vitaut commented Feb 25, 2015

Use MSVC intrinsics for better performance #116

Use MSVC intrinsics for better performance #116

Conversation

CarterLi commented Feb 20, 2015

vitaut commented Feb 20, 2015

vitaut commented Feb 20, 2015

CarterLi commented Feb 21, 2015

vitaut commented Feb 21, 2015

CarterLi commented Feb 21, 2015

vitaut commented Feb 21, 2015

patlecat commented Feb 23, 2015

patlecat commented Feb 23, 2015

CarterLi commented Feb 23, 2015

vitaut commented Feb 23, 2015

patlecat commented Feb 24, 2015

vitaut commented Feb 24, 2015

miloyip commented Feb 25, 2015

vitaut commented Feb 25, 2015