-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIX] Fixing hash4Ptr for Big Endian Systems #3227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
A quick comment (a bit late as I was on PTO when this PR was submitted): At the end of the day, both produce valid frames, which are compatible with each other. Now, it's not the first time that a big-endian user is surprised by a difference in compressed result compared to little-endian. Just, there is probably a (little) cost to this, since big-endian systems now have some extra work to do to shuffle the bytes. |
Thanks so much for the comment! Now I understand the situation better!
Let me work on some performance measurements on AIX and post back. Better late than never! |
Sorry about the delay! Here comes some measurement done on an AIX machine. In short, on average we observed about 0.1% slowdown with bit reversal (this PR). But the confidence intervals of the measurements overlap, so the slowdown is not statistically significant. We picked an input file of size 108MB and run the benchmark for L18 and L19 with (this PR) and without bit reversal. Hence there are a total of four configurations ({L18, L19} x {with bit reversal, without bit reversal}). Each configuration was run 10 times and the running time (real time) of each run was recorded. We then computed the average, fastest, slowest and the 95% CIs. For each level, the average running time without bit reversal was used as baseline. The values in the table are speedup against the baseline (> 1 means a speedup).
Let me know if more information/measurement is desired, and I will go work on them! Thanks so much! |
Thanks for the feedback @qiongsiwu , I'm not too surprised about the small impact on speed, especially at levels 18 and 19, which are very slow to begin with, and do not care much about speed. What seems more relevant is to run the same test at fast levels, typically level 1. Level 1 is expressly designed for compression speed, so that's where such compression speed differences matter. [edit] : I now realize that this PR only modifies Looking at https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h, One scenario where it might affect a high-speed setup is:
|
Sounds good! Thanks for the feedback! I will make some measurements and post back! |
ZSTD_hash4Ptr
does not reverse the bit order when loading form memory.This omissionNot reversing the bit order is causing compressed data size and compression ratio differences between big endian and little endian systems for certain workloads at high compression levels. This PR fixes the differences.