Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performances #56

Merged
merged 9 commits into from
Jul 1, 2013
Merged

Improve performances #56

merged 9 commits into from
Jul 1, 2013

Conversation

dduponchel
Copy link
Collaborator

The aim of this pull request is to reduce the CPU/memory consumption of
JSZip. The main idea is : don't transform the data if it's not
necessary.

Lazy decompress

The main new feature is the lazy decompression. If the user loads a zip
file but only read a single entry, we don't need to INFLATE the other
entries. Moreover, if the user generate() this JSZip object, we
won't DEFLATE every files : we will reuse the compressed data and only
recompress the read entry. This is the goal of JSZip.CompressedObject.
This unfortunately means that we won't be backward compatible : the data
attribute may not be calculated yet.

Don't transform the data until necessary

An other change is the type of ZipObject.data (renamed _data for the
reason above). Instead of transforming it into a string when
loading/adding an ArrayBuffer, we let it be. We will transform it on
demand (getters, generate), not before. The central part of this change
is the JSZip.utils.transformTo function. This adds a big matrix
(transform) to transform any supported type into any other. I
think this is worth the trouble : the other parts of JSZip just need to
know the destination type. This also means that nodejs support comes
nearly from free : the transformTo method will take care of a lot of
things.

Update the INFLATE/DEFLATE implementation

The current implementation (from Masanao Izumo in 1999) is known to have
bugs (issues #22, #26, #29, #43, #52, #53). The new implementation is
from https://github.com/imaya/zlib.js and don't have these bugs.
This implementation is slower than the current one, but this one works.

Other possible improvements

The ArrayBuffer -> unicode string transformation is not efficient. On
Firefox, the TextDecoder API solves this but this is not (yet) available
on the other browsers / nodejs. A solution is to use the BlobReader
API but that means transforming all our API into an asynchronous one.

dduponchel added 9 commits May 9, 2013 22:01
This commit aims to be as lazy as possible :

When reading a file, we now don't decompress the content, we just keep
a reference to the original compressed file and an offset.

If the user accesses a file, we will decompress it and replace the
content (so we don't have to decompress it again).

When generating a zip, if a file has not been decompressed we check if
we can reuse the compressed content.

This unfortunately means that we won't be backward compatible : the
data attribute may not be calculated yet. Worse, the data now can be a
string, an array or a UInt8Array. The user must use the getters !

The interface for compression/decompression has also changed : we now
specify the input type for each operation.

This has been tested in IE 6 -> 10, firefox, chrome, opera.
If anyone has an apple product with safari, he's welcome to test :)
Add the optimizedBinaryString option and hints about performances.
str += 'char' is faster than array.push && array.join but the difference
is not big.
String are immutable so the concatenation will create n(n-1)/2 objects,
so a memory consumption in O(n^2). The array join is in O(n).
When working with large files (hundreds of Mb), O(n^2) is clearly not a
good idea.

Also, use TextDecoder if available to boost perfs.
Working with strings consumes a lot of resources. For example, the
transformation utf8 string -> binary string is faster with the path
utf8 string -> Uint8Array (via TextEncoder) -> binary string.
@Stuk
Copy link
Owner

Stuk commented Jun 30, 2013

Looks great! 👍 Feel free to merge.

To follow semver, because of the breaking changes this should be released as 2.0.0. Does that sound ok?

@dduponchel
Copy link
Collaborator Author

Thanks !
I was thinking of a v2 too to follow semver. Before any release, I have an other pull request to do (which depends heavily on this one) : full nodejs support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants