[AIX] Fixing hash4Ptr for Big Endian Systems #3227

qiongsiwu · 2022-07-30T00:20:39Z

ZSTD_hash4Ptr does not reverse the bit order when loading form memory. ~~This omission~~ Not reversing the bit order is causing compressed data size and compression ratio differences between big endian and little endian systems for certain workloads at high compression levels. This PR fixes the differences.

terrelln

Thanks for the fix!

Cyan4973 · 2022-08-17T00:18:20Z

A quick comment (a bit late as I was on PTO when this PR was submitted):
this wasn't an omission,
rather it was a known difference between big and little endian systems.
It was accepted that this would lead to differences in hash generation, leading to differences in following collision resolutions, leading to small differences in generated compressed data.

At the end of the day, both produce valid frames, which are compatible with each other.

Now, it's not the first time that a big-endian user is surprised by a difference in compressed result compared to little-endian.
And while it could be argued that's not a big deal, I'm fine with the idea of eliminating this difference, even if the only impact is to stop puzzling those users.

Just, there is probably a (little) cost to this, since big-endian systems now have some extra work to do to shuffle the bytes.
Not sure if this impact is measurable, but I would have liked for it to be measured, so that it can be part of the decision.
It's not totally the same if the speed impact is negligible, or if it slows down processing by 10%.
To be complete, my bet here is closer to "negligible".
Just it would have been better to confirm it through actual measurements.

qiongsiwu · 2022-08-17T13:21:29Z

Thanks so much for the comment! Now I understand the situation better!

Just it would have been better to confirm it through actual measurements.

Let me work on some performance measurements on AIX and post back. Better late than never!

qiongsiwu · 2022-09-29T13:29:27Z

Sorry about the delay! Here comes some measurement done on an AIX machine. In short, on average we observed about 0.1% slowdown with bit reversal (this PR). But the confidence intervals of the measurements overlap, so the slowdown is not statistically significant.

We picked an input file of size 108MB and run the benchmark for L18 and L19 with (this PR) and without bit reversal. Hence there are a total of four configurations ({L18, L19} x {with bit reversal, without bit reversal}). Each configuration was run 10 times and the running time (real time) of each run was recorded. We then computed the average, fastest, slowest and the 95% CIs. For each level, the average running time without bit reversal was used as baseline. The values in the table are speedup against the baseline (> 1 means a speedup).

	L18 without reversal	L18 with reversal	L19 without reversal	L19 with reversal
Average	(baseline) 1.0000	0.9990	(baseline) 1.0000	0.9988
Slowest	0.9866	0.9982	0.9905	0.9923
Fastest	1.0056	0.9995	1.0066	1.0002
95% CI slow end	0.9948	0.9987	0.9962	0.9974
95% CI fast end	1.0052	0.9992	1.0038	1.0003

Let me know if more information/measurement is desired, and I will go work on them!

Thanks so much!

Cyan4973 · 2022-09-29T16:35:29Z

Thanks for the feedback @qiongsiwu ,

I'm not too surprised about the small impact on speed, especially at levels 18 and 19, which are very slow to begin with, and do not care much about speed.

What seems more relevant is to run the same test at fast levels, typically level 1.

Level 1 is expressly designed for compression speed, so that's where such compression speed differences matter.
Let's be clear that 0.1% regression is still acceptable, even for level 1,
we may have an issue to look into if regression reaches something like > 2%.

[edit] : I now realize that this PR only modifies ZSTD_hash4Ptr(),
and there are limited nb of scenarios where this function is invoked.

Looking at https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h,
this function is actually never used for level 1 !

One scenario where it might affect a high-speed setup is:

Level 2
Compress files < 16 KB

qiongsiwu · 2022-09-30T14:24:11Z

One scenario where it might affect a high-speed setup is:

Level 2
Compress files < 16 KB

Sounds good! Thanks for the feedback! I will make some measurements and post back!

terrelln · 2022-10-04T13:55:49Z

@Cyan4973 for context we had already switched all of our other hash functions to use ZSTD_readLE*(), presumably for determinism, in 4f0a393. We just skipped ZSTD_hash4().

Fix hash4Ptr for big endian

0850631

facebook-github-bot added the CLA Signed label Jul 30, 2022

terrelln approved these changes Aug 1, 2022

View reviewed changes

terrelln merged commit 1b445c1 into facebook:dev Aug 1, 2022

qiongsiwu deleted the fix_hash4Ptr_big_endian branch August 17, 2022 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIX] Fixing hash4Ptr for Big Endian Systems #3227

[AIX] Fixing hash4Ptr for Big Endian Systems #3227

qiongsiwu commented Jul 30, 2022 •

edited

Loading

terrelln left a comment

Cyan4973 commented Aug 17, 2022

qiongsiwu commented Aug 17, 2022

qiongsiwu commented Sep 29, 2022 •

edited

Loading

Cyan4973 commented Sep 29, 2022 •

edited

Loading

qiongsiwu commented Sep 30, 2022

terrelln commented Oct 4, 2022 •

edited

Loading

[AIX] Fixing hash4Ptr for Big Endian Systems #3227

[AIX] Fixing hash4Ptr for Big Endian Systems #3227

Conversation

qiongsiwu commented Jul 30, 2022 • edited Loading

terrelln left a comment

Choose a reason for hiding this comment

Cyan4973 commented Aug 17, 2022

qiongsiwu commented Aug 17, 2022

qiongsiwu commented Sep 29, 2022 • edited Loading

Cyan4973 commented Sep 29, 2022 • edited Loading

qiongsiwu commented Sep 30, 2022

terrelln commented Oct 4, 2022 • edited Loading

qiongsiwu commented Jul 30, 2022 •

edited

Loading

qiongsiwu commented Sep 29, 2022 •

edited

Loading

Cyan4973 commented Sep 29, 2022 •

edited

Loading

terrelln commented Oct 4, 2022 •

edited

Loading