Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deflate should use __builtin_ctzll (64bit) instead of __builtin_ctzl (32bit) #60

Open
wants to merge 2 commits into
base: gcc.amd64
Choose a base branch
from

Conversation

mgambrell
Copy link

I think the use of the 32bit operation is probably a mistake. It seems to leave an appreciable amount of compression on the table, perhaps by finding shorter matches (4 bytes) in cases where there may have been 5-8. I did not assess the performance implications. I did check the upstream code and the intention seems to have been to SWAR a fundamentally 8-byte serial process, but this intention was not achieved with the 32bit operation.

If my analysis is wrong for some arcane reason, then I think it warrants explaining in a comment at the point of the call why the strange choice was made.

deflate.h Outdated
@@ -334,11 +334,11 @@ extern const uint8_t ZLIB_INTERNAL _dist_code[];
#define likely(x) (x)
#define unlikely(x) (x)

int __inline __builtin_ctzl(unsigned long mask)
uint64_t __inline __builtin_ctzll(uint64_t mask)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since index is an unsigned long and at most 64, can you replace the return value here with unsigned long and remove the cast to int in the return expression? I doubt it matters for codegen but it should look slightly cleaner.

@fhanau
Copy link

fhanau commented Sep 26, 2024

Thank you for filing this PR. Please note that __builtin_ctzl takes in an unsigned long, which is 64-bit on many platforms (e.g. arm64 and x86_64 Linux and macOS), so in that case this works as intended.
On platforms where unsigned long is 32 bits (e.g. Windows), I think you're right and compression can be affected. When longest_match() finds that an 8-byte match is no longer possible, it tries to get a partial match of 0 - 7 bytes, but with a 32-bit __builtin_ctzl it would find at most 4 bytes, resulting in a shorter match length.
I left some comments on the __builtin_ctzll implementation for MSVC, LGTM otherwise. @vkrasnov @kornelski can you approve this once ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants