-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add lazyEntries option #22
Comments
TODO: investigate alleged "undefined behavior" in the readme when calling |
Per https://github.com/thejoshwolfe/yauzl#openpath-options-callback, If it’s not recommended, let’s change the default? |
That would be an API breaking change, so it will happen at the major version bump for yauzl 3.0.0. |
One of the main goals of
yauzl
is to keep memory usage under control. The current pattern ofentry
andend
events is hostile to this goal.Using
examples/unzip.js
on http://wolfesoftware.com/crowded.zip.xz (unxz it first) results in over 2GB of virtual ram allocation as the program slows to a crawl while creating empty directories. This is failure. The problem is that we callopenReadStream
(0bf7f48#diff-5b07aa03e052f091324dde1dfbfd25bfR53) on every entry before pumping any of the file data. This is the naive solution to the problem that #18 ran into in the test system whereautoClose
can lead to race conditions.Really, the problem is that
autoClose
expects you to call all youropenReadStream()
s before yauzl emits theend
event, and yauzl races to emit theend
event as soon as possible. Even withautoClose
off, you need to keep all yourEntry
objects in ram as yauzl emits them, or you'll lose the information.We need some way to throttle entry reading.
Also, it's questionable that yauzl's API encourages clients to extract all files in parallel. This would probably be worse performance on any file system that tries to pre-fetch data ahead of your
read()
calls (pretty much all of them, I think) than the normal strategy of extracting one file at a time from start to finish. I didn't test that, but it seems like in general file system access wouldn't necessarily benefit from parallelization; in fact, massive parallelization probably makes things much worse due to cache thrashing and the extra resources required to keep all the pipelines open.It will be nice to remove
examples/unzip.js
's dependency on a third-party synchronization librarypend
, which is currently being used to iterate over the entries one at a time despite yauzl's API giving us all the entries in parallel. It's a smell that our own example doesn't use the recommended execution order.The text was updated successfully, but these errors were encountered: