deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

mgambrell · 2024-09-24T12:41:28Z

I think the use of the 32bit operation is probably a mistake. It seems to leave an appreciable amount of compression on the table, perhaps by finding shorter matches (4 bytes) in cases where there may have been 5-8. I did not assess the performance implications. I did check the upstream code and the intention seems to have been to SWAR a fundamentally 8-byte serial process, but this intention was not achieved with the 32bit operation.

If my analysis is wrong for some arcane reason, then I think it warrants explaining in a comment at the point of the call why the strange choice was made.

…(32bit)

fhanau · 2024-09-26T17:37:09Z

deflate.h

@@ -334,11 +334,11 @@ extern const uint8_t ZLIB_INTERNAL _dist_code[];
 #define likely(x)       (x)
 #define unlikely(x)     (x)

-int __inline __builtin_ctzl(unsigned long mask)
+uint64_t __inline __builtin_ctzll(uint64_t mask)


Nit: Since index is an unsigned long and at most 64, can you replace the return value here with unsigned long and remove the cast to int in the return expression? I doubt it matters for codegen but it should look slightly cleaner.

fhanau · 2024-09-26T17:39:16Z

Thank you for filing this PR. Please note that __builtin_ctzl takes in an unsigned long, which is 64-bit on many platforms (e.g. arm64 and x86_64 Linux and macOS), so in that case this works as intended.
On platforms where unsigned long is 32 bits (e.g. Windows), I think you're right and compression can be affected. When longest_match() finds that an 8-byte match is no longer possible, it tries to get a partial match of 0 - 7 bytes, but with a 32-bit __builtin_ctzl it would find at most 4 bytes, resulting in a shorter match length.
I left some comments on the __builtin_ctzll implementation for MSVC, LGTM otherwise. @vkrasnov @kornelski can you approve this once ready?

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl …

169e7aa

…(32bit)

fhanau reviewed Sep 26, 2024

View reviewed changes

__builtin_ctzll should return int, not uint64_t

c9a1ffa

kornelski approved these changes Sep 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

mgambrell commented Sep 24, 2024

fhanau Sep 26, 2024

fhanau commented Sep 26, 2024

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

Are you sure you want to change the base?

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

Conversation

mgambrell commented Sep 24, 2024

fhanau Sep 26, 2024

Choose a reason for hiding this comment

fhanau commented Sep 26, 2024