-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performances #56
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit aims to be as lazy as possible : When reading a file, we now don't decompress the content, we just keep a reference to the original compressed file and an offset. If the user accesses a file, we will decompress it and replace the content (so we don't have to decompress it again). When generating a zip, if a file has not been decompressed we check if we can reuse the compressed content. This unfortunately means that we won't be backward compatible : the data attribute may not be calculated yet. Worse, the data now can be a string, an array or a UInt8Array. The user must use the getters ! The interface for compression/decompression has also changed : we now specify the input type for each operation. This has been tested in IE 6 -> 10, firefox, chrome, opera. If anyone has an apple product with safari, he's welcome to test :)
Use https://github.com/imaya/zlib.js instead of an old implementation.
Add the optimizedBinaryString option and hints about performances.
str += 'char' is faster than array.push && array.join but the difference is not big. String are immutable so the concatenation will create n(n-1)/2 objects, so a memory consumption in O(n^2). The array join is in O(n). When working with large files (hundreds of Mb), O(n^2) is clearly not a good idea. Also, use TextDecoder if available to boost perfs.
Working with strings consumes a lot of resources. For example, the transformation utf8 string -> binary string is faster with the path utf8 string -> Uint8Array (via TextEncoder) -> binary string.
See http://jsperf.com/array-direct-assignment-vs-push/31, direct assignment is faster than push.
Looks great! 👍 Feel free to merge. To follow semver, because of the breaking changes this should be released as 2.0.0. Does that sound ok? |
Thanks ! |
This was referenced Oct 5, 2013
Closed
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The aim of this pull request is to reduce the CPU/memory consumption of
JSZip. The main idea is : don't transform the data if it's not
necessary.
Lazy decompress
The main new feature is the lazy decompression. If the user loads a zip
file but only read a single entry, we don't need to INFLATE the other
entries. Moreover, if the user
generate()
this JSZip object, wewon't DEFLATE every files : we will reuse the compressed data and only
recompress the read entry. This is the goal of JSZip.CompressedObject.
This unfortunately means that we won't be backward compatible : the data
attribute may not be calculated yet.
Don't transform the data until necessary
An other change is the type of ZipObject.data (renamed _data for the
reason above). Instead of transforming it into a string when
loading/adding an ArrayBuffer, we let it be. We will transform it on
demand (getters, generate), not before. The central part of this change
is the JSZip.utils.transformTo function. This adds a big matrix
(
transform
) to transform any supported type into any other. Ithink this is worth the trouble : the other parts of JSZip just need to
know the destination type. This also means that nodejs support comes
nearly from free : the transformTo method will take care of a lot of
things.
Update the INFLATE/DEFLATE implementation
The current implementation (from Masanao Izumo in 1999) is known to have
bugs (issues #22, #26, #29, #43, #52, #53). The new implementation is
from https://github.com/imaya/zlib.js and don't have these bugs.
This implementation is slower than the current one, but this one works.
Other possible improvements
The ArrayBuffer -> unicode string transformation is not efficient. On
Firefox, the TextDecoder API solves this but this is not (yet) available
on the other browsers / nodejs. A solution is to use the BlobReader
API but that means transforming all our API into an asynchronous one.