Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime hardware assisted CRC32 for intel processors #4933

Closed
wants to merge 1 commit into from

Conversation

sergeyn
Copy link
Contributor

@sergeyn sergeyn commented Jan 22, 2022

This is my 2nd attempt to enable hardware crc32 . This time I made it with minimal modifications required, compile-time only . It should work with intel/microsoft/gcc/clang compilers on windows/linux.

To test it in visual studio, make sure you enable AVX instruction set in project settings.

Brief description of changes to the code:

  • ImHashData does hardware crc32 if built with instruction set supporting at least sse42 on intel platform, otherwise original implementation is used

  • ImHashStr now doesn't handle CRC32, just handles '###' then calls ImHashData to do actual CRC32 computation
    

Regards

Edit: sorry for multiple requests posted earlier - noob git user here.

@ocornut
Copy link
Owner

ocornut commented Jan 22, 2022 via email

@sergeyn
Copy link
Contributor Author

sergeyn commented Jan 22, 2022

I originally pushed from the wrong branch. sorry for spam

@sergeyn
Copy link
Contributor Author

sergeyn commented Jan 22, 2022

4space tabs - fixed
squash commits - fixed

@ocornut ocornut changed the title compile time hardware crc32 for intel processors Runtime hardware assisted CRC32 for intel processors Jan 24, 2022
@ocornut ocornut force-pushed the master branch 3 times, most recently from c817acb to 8d39063 Compare February 15, 2022 16:25
@0x1F9F1
Copy link

0x1F9F1 commented Jun 1, 2022

Do you have any performance comparisons for this?

In ImHashStr you are now iterating over the data 2 or 3 times (strlen + memchr + ImHashData). Due to the small average size of the strings being hashed, it seems likely the overhead of this approach will outweigh any performance benefits.

Perhaps a simple wrapper over _mm_crc32_u8 would be a good compromise?

#if defined(__SSE4_2__) || defined(__AVX__)
#include <nmmintrin.h>
#define IM_CRC32_U8(CRC, C) (_mm_crc32_u8(CRC, C))
#else
// static const ImU32 GCrc32LookupTable[256] = ...
#define IM_CRC32_U8(CRC, C) ((CRC >> 8) ^ GCrc32LookupTable[(CRC & 0xFF) ^ C])
#endif

@sergeyn
Copy link
Contributor Author

sergeyn commented Jun 5, 2022

Since data is small, it'll end up in cache after 1st strlen thus following memchr is not that big of a deal, and ImHashData tries to do 8chars at a time. In my tests strings >16 chars start to win vs your proposed "good compromise" approach.
But yeah, just replacing table lookup _mm_crc32_u8 as you suggested is a good compromise indeed since in my tests on average it halves execution time.

@ocornut
Copy link
Owner

ocornut commented Nov 27, 2024

FYI a simpler version of this was merged for #8169 which included a change of CRC32 table so both paths would match (which is the most important thing).

As per discussion above it didn't seem worth to do the dual pass ### scan for strings (which should generally be small).
I have omitted used _mm_crc32_u32() as suggested by @sergeyn in 2d66010

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants