-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archive encrypted with Zip Crypto algorithm (weak encrypt) is extremely slow under stream unzip #91
Comments
Probably yes there is a way to speed it up. But 5MB/minute is slower than I would expect even for the code as it is right now. Do you have a short snippet of code that I could run to show it is that slow? |
Ah he's an example zipping a 100MB file of pseudo-random data, so pretty much the worst case in terms of compression: import datetime
import subprocess
import random
from stream_unzip import stream_unzip
# Always deal with 65 KiB
max_chunk = 65536
# Create 100MB file of pseudo-random data
print('Creating uncompressed file...')
total = 100_000_000
remaining = total
random.seed(0)
with open('random.txt', 'wb') as f:
while remaining:
chunk_size = min(max_chunk, remaining)
f.write(random.randbytes(chunk_size))
remaining -= chunk_size
print('Done')
# ZIP the file
print('Creating password-protected ZIP...')
subprocess.check_output(['zip', '-P', 'mypassword', 'random.zip', 'random.txt'])
print('Done')
# UnZIP
print('Unzipping with stream_unzip')
start = datetime.datetime.now()
with open('random.zip', 'rb') as f:
zipped_chunks = iter(lambda: f.read(max_chunk), b'')
for file_name, size, chunks in stream_unzip(zipped_chunks, password=b'mypassword'):
for _ in chunks:
pass
end = datetime.datetime.now()
taken = end - start
print('Done:', taken) For me, the unzipping takes just under a minute, so it's more like 100MB/min. Not the speediest thing in the world, but more than an order of magnitude faster than 5MB/min. (And I'm just on a fairly regular laptop I think?) So it would be good to see an example where it's 5MB/min |
Comparing with Python's zipfile, zipfile is about 10% faster than stream_unzip for me print('Unzipping with zipfile')
start = datetime.datetime.now()
with zipfile.ZipFile('random.zip') as myzip:
myzip.setpassword(b'mypassword')
with myzip.open('random.txt') as f:
unzipped_chunks = iter(lambda: f.read(chunk_size), b'')
for _ in unzipped_chunks:
pass
end = datetime.datetime.now()
taken = end - start
print('Done:', taken) So while stream_unzip maybe could probably be made faster (if zipfile can do it, why not stream_unzip?), I am suspecting the 5MB/min pain is from something else somehow? |
Inspired by the report at #91, found some performance improvements for the ZipCrypto function, decrypt_weak_decompress. From some light testing of a password proteceted 100MB file of pseudo-random data on my local filesystem, it seems to reduce decryption+decompression time from ~55 seconds to ~46 seconds, which also makes it a bit faster than Python's zipfile, at least in this circumstance.
Inspired by the report at #91, found some performance improvements for the ZipCrypto function, decrypt_weak_decompress. From some light testing of a password proteceted 100MB file of pseudo-random data on my local filesystem, it seems to reduce decryption+decompression time from ~55 seconds to ~46 seconds, which also makes it a bit faster than Python's zipfile, at least in this circumstance.
Found a few ways to improve stream_unzip's ZipCrypto decrypting: #92, changing it from ~10% slower than Python's zipfile, to ~10% faster, at least for my tests |
Inspired by the report at #91, found some performance improvements for the ZipCrypto function, decrypt_weak_decompress. From some light testing of a password proteceted 100MB file of pseudo-random data on my local filesystem, it seems to reduce decryption+decompression time from ~55 seconds to ~45 seconds, which also makes it a bit faster than Python's zipfile, at least in this circumstance.
Inspired by the report at #91, found some performance improvements for the ZipCrypto function, decrypt_weak_decompress. From some light testing of a password proteceted 100MB file of pseudo-random data on my local filesystem, it seems to reduce decryption+decompression time from ~55 seconds to ~45 seconds, which also makes it a bit faster than Python's zipfile, at least in this circumstance.
#92 is now released in v0.0.92 |
One thing crosses my mind... could the Zip Crypto thing be a red herring? Could the 5MB/min in fact be due to the file using Deflate64, which is known to be incredible slow in stream-unzip: #82 |
Appreciate it for debug & improvement! Your improvement does make everything faster. Back to my question, I found the reason that slows down the unzip is a combination of factors:
|
If anyone stumbles on this, then decryption of ZipCrypto should now (as of v0.0.97) be much faster in stream-unzip (via Rust-based decrypting). From some light testing should be about 10 times as fast as Python’s zipfile module now. |
I use this tool to streamly unzip zip archive. Some of them are encrypted with Zip Crypto algorithm. I see this might triggers weak decrypter in
stream_unzip.py
:stream-unzip/stream_unzip.py
Lines 211 to 221 in 4e19403
However, the running efficiency is extreamly low using such python loop (approximately 5MB/minute). Any way to speed up?
The text was updated successfully, but these errors were encountered: