Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIX] Fixing hash4Ptr for Big Endian Systems #3227

Merged
merged 1 commit into from
Aug 1, 2022

Conversation

qiongsiwu
Copy link
Contributor

@qiongsiwu qiongsiwu commented Jul 30, 2022

ZSTD_hash4Ptr does not reverse the bit order when loading form memory. This omission Not reversing the bit order is causing compressed data size and compression ratio differences between big endian and little endian systems for certain workloads at high compression levels. This PR fixes the differences.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@terrelln terrelln merged commit 1b445c1 into facebook:dev Aug 1, 2022
@Cyan4973
Copy link
Contributor

A quick comment (a bit late as I was on PTO when this PR was submitted):
this wasn't an omission,
rather it was a known difference between big and little endian systems.
It was accepted that this would lead to differences in hash generation, leading to differences in following collision resolutions, leading to small differences in generated compressed data.

At the end of the day, both produce valid frames, which are compatible with each other.

Now, it's not the first time that a big-endian user is surprised by a difference in compressed result compared to little-endian.
And while it could be argued that's not a big deal, I'm fine with the idea of eliminating this difference, even if the only impact is to stop puzzling those users.

Just, there is probably a (little) cost to this, since big-endian systems now have some extra work to do to shuffle the bytes.
Not sure if this impact is measurable, but I would have liked for it to be measured, so that it can be part of the decision.
It's not totally the same if the speed impact is negligible, or if it slows down processing by 10%.
To be complete, my bet here is closer to "negligible".
Just it would have been better to confirm it through actual measurements.

@qiongsiwu qiongsiwu deleted the fix_hash4Ptr_big_endian branch August 17, 2022 13:18
@qiongsiwu
Copy link
Contributor Author

Thanks so much for the comment! Now I understand the situation better!

Just it would have been better to confirm it through actual measurements.

Let me work on some performance measurements on AIX and post back. Better late than never!

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Sep 29, 2022

Sorry about the delay! Here comes some measurement done on an AIX machine. In short, on average we observed about 0.1% slowdown with bit reversal (this PR). But the confidence intervals of the measurements overlap, so the slowdown is not statistically significant.

We picked an input file of size 108MB and run the benchmark for L18 and L19 with (this PR) and without bit reversal. Hence there are a total of four configurations ({L18, L19} x {with bit reversal, without bit reversal}). Each configuration was run 10 times and the running time (real time) of each run was recorded. We then computed the average, fastest, slowest and the 95% CIs. For each level, the average running time without bit reversal was used as baseline. The values in the table are speedup against the baseline (> 1 means a speedup).

L18 without reversal L18 with reversal L19 without reversal L19 with reversal
Average (baseline) 1.0000 0.9990 (baseline) 1.0000 0.9988
Slowest 0.9866 0.9982 0.9905 0.9923
Fastest 1.0056 0.9995 1.0066 1.0002
95% CI slow end 0.9948 0.9987 0.9962 0.9974
95% CI fast end 1.0052 0.9992 1.0038 1.0003

Let me know if more information/measurement is desired, and I will go work on them!

Thanks so much!

@Cyan4973
Copy link
Contributor

Cyan4973 commented Sep 29, 2022

Thanks for the feedback @qiongsiwu ,

I'm not too surprised about the small impact on speed, especially at levels 18 and 19, which are very slow to begin with, and do not care much about speed.

What seems more relevant is to run the same test at fast levels, typically level 1.

Level 1 is expressly designed for compression speed, so that's where such compression speed differences matter.
Let's be clear that 0.1% regression is still acceptable, even for level 1,
we may have an issue to look into if regression reaches something like > 2%.

[edit] : I now realize that this PR only modifies ZSTD_hash4Ptr(),
and there are limited nb of scenarios where this function is invoked.

Looking at https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h,
this function is actually never used for level 1 !

One scenario where it might affect a high-speed setup is:

  • Level 2
  • Compress files < 16 KB

@qiongsiwu
Copy link
Contributor Author

One scenario where it might affect a high-speed setup is:

Level 2
Compress files < 16 KB

Sounds good! Thanks for the feedback! I will make some measurements and post back!

@terrelln
Copy link
Contributor

terrelln commented Oct 4, 2022

@Cyan4973 for context we had already switched all of our other hash functions to use ZSTD_readLE*(), presumably for determinism, in 4f0a393. We just skipped ZSTD_hash4().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants