add lazyEntries option #22

thejoshwolfe · 2015-12-23T07:05:50Z

One of the main goals of yauzl is to keep memory usage under control. The current pattern of entry and end events is hostile to this goal.

Using examples/unzip.js on http://wolfesoftware.com/crowded.zip.xz (unxz it first) results in over 2GB of virtual ram allocation as the program slows to a crawl while creating empty directories. This is failure. The problem is that we call openReadStream (0bf7f48#diff-5b07aa03e052f091324dde1dfbfd25bfR53) on every entry before pumping any of the file data. This is the naive solution to the problem that #18 ran into in the test system where autoClose can lead to race conditions.

Really, the problem is that autoClose expects you to call all your openReadStream()s before yauzl emits the end event, and yauzl races to emit the end event as soon as possible. Even with autoClose off, you need to keep all your Entry objects in ram as yauzl emits them, or you'll lose the information.

We need some way to throttle entry reading.

Also, it's questionable that yauzl's API encourages clients to extract all files in parallel. This would probably be worse performance on any file system that tries to pre-fetch data ahead of your read() calls (pretty much all of them, I think) than the normal strategy of extracting one file at a time from start to finish. I didn't test that, but it seems like in general file system access wouldn't necessarily benefit from parallelization; in fact, massive parallelization probably makes things much worse due to cache thrashing and the extra resources required to keep all the pipelines open.

It will be nice to remove examples/unzip.js's dependency on a third-party synchronization library pend, which is currently being used to iterate over the entries one at a time despite yauzl's API giving us all the entries in parallel. It's a smell that our own example doesn't use the recommended execution order.

The text was updated successfully, but these errors were encountered:

thejoshwolfe · 2015-12-23T21:52:11Z

TODO: investigate alleged "undefined behavior" in the readme when calling close() before the end() event if we have lazyEntries: true.

mathiasbynens · 2018-05-06T04:31:41Z

Per https://github.com/thejoshwolfe/yauzl#openpath-options-callback, lazyEntries: false is still the default. Later in that section, it says “If lazyEntries is false […]. This is not recommended […]”.

If it’s not recommended, let’s change the default?

thejoshwolfe · 2018-05-06T16:35:15Z

let’s change the default

That would be an API breaking change, so it will happen at the major version bump for yauzl 3.0.0.

thejoshwolfe added the bug label Dec 23, 2015

thejoshwolfe changed the title ~~emitting entry events requires clients use RAM per entry~~ add lazyEntries option Dec 23, 2015

thejoshwolfe closed this as completed in 7a6cdde Dec 23, 2015

thejoshwolfe mentioned this issue Dec 23, 2015

race condition in test.js: Error: closed #18

Closed

thejoshwolfe mentioned this issue Dec 31, 2015

update to yauzl 2.4 api max-mapper/extract-zip#17

Merged

bilou84 mentioned this issue Jul 23, 2016

updateManager: Use async.eachLimit to uncompress multiple files at once superpowers/superpowers-app#13

Closed

thejoshwolfe mentioned this issue Oct 27, 2016

Support a promise paradigm #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add lazyEntries option #22

add lazyEntries option #22

thejoshwolfe commented Dec 23, 2015

thejoshwolfe commented Dec 23, 2015

mathiasbynens commented May 6, 2018

thejoshwolfe commented May 6, 2018 •

edited

Loading

add lazyEntries option #22

add lazyEntries option #22

Comments

thejoshwolfe commented Dec 23, 2015

thejoshwolfe commented Dec 23, 2015

mathiasbynens commented May 6, 2018

thejoshwolfe commented May 6, 2018 • edited Loading

thejoshwolfe commented May 6, 2018 •

edited

Loading