-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid intermediate array for lz4/zlib #128
Conversation
Codecov Report
@@ Coverage Diff @@
## master #128 +/- ##
==========================================
- Coverage 92.45% 92.17% -0.29%
==========================================
Files 11 11
Lines 1392 1418 +26
==========================================
+ Hits 1287 1307 +20
- Misses 105 111 +6
Continue to review full report at Codecov.
|
For future reference, this is how ROOT does zlib for example, but the others are in there as well: |
do we want to wrap this whole if-else logic into a function? since technically there's another place we use this: |
in principle, yes, we can consolidate them, though I don't know if it's worth it because the streamers are only unpacked once and they're small. Maybe once we're happy with the inplace versions (and perhaps also have done if for the remaining two algos as well) |
And for reference, the updated zlib decompression function works with zlib-ng still. Eg cp ~/julia-1.7.0-beta4/lib/julia/libz.so.1.2.11 bck.so # backup
# compile zlib-ng
cp ~/zlib-ng/libz.so.1.2.11.zlib-ng ~/julia-1.7.0-beta4/lib/julia/libz.so.1.2.11 and gives another 10% improvement in decompression speed itself. So when that gets incorporated, we get another speedboost. |
When decompressing, we do something like
@view(uncomp_data[fufilled+1:fufilled+uncompbytes]) = <output of decompression algo>
for LZ4, this is actually making a temporary array
out_buffer
which is then copied intouncomp_data
.By feeding
LZ4_decompress_safe
a pointer touncomp_data
, we can go faster! 🚀And the same exercise was repeated for ZLIB to eliminate the intermediate storage. I copied almost exactly what ROOT has.
LZ4
before
after
ZLIB
before
after
......so ~15% speedups for zlib and more for lz4.