-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIPs in ZIP #89
Comments
In order for this to work efficiently, the inner zip files must not be compressed. If they are compressed, then you have to buffer the inner zipfiles somewhere; I would recommend on disk if that were the case. A zipfile must be read starting from the end, and a compressed file must be read starting from the beginning. There is no solution to this problem that doesn't involve buffering the entire file somewhere. Assuming the inner zipfiles are not compressed (instead they are "stored"), then you can implement a RandomAccessReader for each entry in the greater zipfile and call
Use That should work for you. I've never tried doing this myself, but all the tools exist to make this work. Let me know if you get it working or if you run into another problem. |
Also, it's worth noting that recursive unzipping is a vector for zip bomb attacks, so be sure to limit the system resources devoted to reading zip files provided by untrusted users with this strategy. |
Thanks. I have come to a solution using lazyEntries, and calling readEntry in callback after processing the internal ZIP. I use buffers -- my files are from controlled source, anyway, ZIP bomb attacks can be limited by filtering uncompressedlength before allocation of buffers. Careful reuse of buffers also improved a bit. FYI: Whereas inside the ZIP: |
@thejoshwolfe I'm interested in doing exactly what you describe here: creating a
However, the |
…pport create a reader implementation for stored entries within a zip file, thereby supporting stored zips within zips. Add a test for the new classes based on the existing range-test. Related to thejoshwolfe#89
@brucehappy yikes! You found a nasty problem with this API. Looks like I got burned by not keeping my function colors straight. 🤦♂️ I'll take a look at your PR. |
Question:
Assume large (internal) ZIP files stored in a huge (main) ZIP file (maybe more than depth 2).
How one could process all files within internal ZIPs, without severe, multiple buffering of several internal files?
Tried:
Main ZIP read to buffers, buffers processed sync with jszip-sync, known to be very slow.
is there no way to process with yauzl buffers just sync?
When reading the main ZIP file, the large ZIP files are to be read into a buffer (preferred) or to a temp file (to be avoided), since no way to read streams with yauzl.
When the inner ZIP files are processed, from buffer, with yauzl, due to the event mechanism the events of the inner ZIP files (zipFile.on("entry") and words, the openReadstream's callback readStream) events come mixed with the zipFile.on("entry").
Cannot zipFile cannot be asked anyhow NOT to produce a next entry until explicitly allowed, aka pause/resume?
Then I thought I will read the entries of the main ZIP file to an array (since no way to access the central directory in another way), then, not closing the ZIP, for each entry in entries array calling openReadStream, and reading the streams as needed. Then, not surprisingly, all openReadStreams of the main ZIP file nearly fired at once.
Cannot get access to the central directory not requiring an extra step and events to access entries?
Can zipfile.openReadStream by considered async and then await for completion of all the inner?
Not tried
Promisifying, async await (failed)
Seemed nearly impossible; the processing of inner ZIP files were anyway sync calls, but they get async immediately on readStream.on("readable")
Using e.g. bottleneck (last chance)
Opening with autoclose false
Loading full directory of the main ZIP into a copy of the Central Directory into an array (CDClone)
Creating bottleneck limiter, starting processing from the 0th element of the CDClone
Here: processing the actual entry of the main ZIP file means
This is remains asynchron but for what a price...
It seems too vulnerable to any kind of errors.
The text was updated successfully, but these errors were encountered: