Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider use better deflate/inflate #102

Closed
puzrin opened this issue Feb 19, 2014 · 8 comments
Closed

Consider use better deflate/inflate #102

puzrin opened this issue Feb 19, 2014 · 8 comments

Comments

@puzrin
Copy link
Contributor

puzrin commented Feb 19, 2014

We a doing js port of latest zlib. Now deflate works, and inflate is coming soon. https://github.com/nodeca/pako . It's written with node.js, but targeted for browsers as modern deflate/inflate implementation.

Preliminary results, compared to one in js-zip:

  1. ~3x faster (level 6)
  2. result is binary compatible with zlib
  3. all options supported
  4. compressed size is smaller with default settings
  5. modular/commonjs - you can browserify any components as you need
  6. simple api to work with big chunked blobs

I'll be glad if you find it useful. If you have suggestion about API - let me know.

@Stuk
Copy link
Owner

Stuk commented Feb 19, 2014

This looks very cool!

@puzrin
Copy link
Contributor Author

puzrin commented Mar 13, 2014

pako's master is ready for release. Could you take a look? May be, you have some requests or recommendations about api?

@dduponchel
Copy link
Collaborator

I tested pako in JSZip and the integration was easy : I just had to replace the
compress method with return pako.deflateRaw(input); (same with uncompress
and inflateRaw).

Regarding the size of the generated file, it gained 40% (minified/gzipped) :

type without any DEFLATE code with zlibjs with pako
Original 77926 bytes 94639 bytes 263042 bytes
Minified 28642 bytes 45091 bytes 68886 bytes
Gzipped 6742 bytes 10972 bytes 15463 bytes

The huge difference in the first line ("original) comes from zlibjs which is
pulled already minified (so any comparison here is meaningless).

The "without any DEFLATE code" column is what I get when I comment almost
everything in our lib/flate.js. This is here only to have an indication of
what takes JSZip "core".

Almost 9KB minified/gzipped is bigger than zlib.js but still far better than an
emscriptem build of zlib (57KB).

I created a small benchmark with two tests :

  • deflate a ~600KB text file
  • inflate a ~600KB deflated image file
    On IE 9, the content has been converted to an array before the benchmark.

Disclaimer : the numbers here are not really significant : I took random files
on my filesystem, created a benchmark.js suite, put two tests and loaded the
page in different browsers in a VM. That being said, it should be enough to see
the speed gain.

browser / test zlib.js pako
Firefox / inflate 122 ops/sec ±4.52% 191 ops/sec ±6.34%
Chrome / inflate 111 ops/sec ±3.93% 171 ops/sec ±4.08%
IE 9 / inflate unresponsive page 33.43 ops/sec ±2.25%
IE 11 / inflate 43.52 ops/sec ±3.99% 41.90 ops/sec ±1.38%
Firefox / deflate 0.99 ops/sec ±46.45% 8.99 ops/sec ±3.64%
Chrome / deflate 2.78 ops/sec ±3.83% 10.21 ops/sec ±1.04%
IE 9 / deflate 1.11 ops/sec ±1.80% 2.72 ops/sec ±0.51%
IE 11 / deflate 1.59 ops/sec ±2.04% 6.37 ops/sec ±2.29%

I also tested with a 136MB xml file compressed into a 2.4M file, in Firefox
and Chrome. That's completely crazy but that comes from an actual bug report
on JSZip (testing this in a VM with only 1GB of RAM was not a good idea).

browser / test zlib.js pako
Firefox / inflate Firefox gives up 1.29 ops/sec ±6.08%
Chrome / inflate page crashes 1.16 ops/sec ±2.31%
Firefox / deflate unresponsive script 0.33 ops/sec ±0.35%
Chrome / deflate page crashes error

The memory pressure has a clear impact here : outside the VM (with more RAM)
the tests with pako are slightly faster and all complete.

These numbers are impressive, good work !

This looks good, even if the 40% size increase is a bit worrying.

Next step for me, check more files with pako (at least, the files that caused
bugs with the previous/current implementation).

@puzrin
Copy link
Contributor Author

puzrin commented Mar 15, 2014

@dduponchel i'm really glad, that you have good first impression about our work. A couple of comments:

size

That's a first "step". We had no goal to create package of minimal size. Goal was having fun with investigation v8 jit speed, and "finally" making correct and maintainable zlib port. As far as i know, jszip use something called like deflate raw and inflate raw, while pako has not been splitted yet to such small functions.

I think, that situation with size can be partially improved (with cutting gzip support and zlib headers support), but i don't like premature optimizations. If one say, that 5kb difference is a real pain that prevents uzing pako, i'll try reduce size. Personally, i'd prefer to avoid this, because that can affect maintainability and sync with future zlib versions (and just not interesting, LOL).

speed

Note, that jszip's current deflate and pako have differend defaults. zlib.js level 6 is NOT ther same as original zlib level6 (pako's results are binary equal for all variations). Try to vary level option, to make pako's output of the same size that you have now with zlib.js, and you will see that speed difference is even bigger.

processing huge files

Honestly, that should not be done with one big chunk. Browser have Blob (BlobBuilder) support for such tasks. By default pako push chunks in array, and joins buffers at the end. You can rewrite onData and onEnd handler, to send result into blob object. On input you should also use chunks, but that's already supported. I'd recommend to look wraper code https://github.com/nodeca/pako/blob/master/lib/deflate.js , if you wish to do so, it's very simple.


I have no serious "browser programming" experience, and more strong with server side code design. So, you can find, that pako API is not enougth good for browsers. If you know, how to improve API - let me know, i'm open to any suggestion.

@dduponchel
Copy link
Collaborator

Thanks for your reply :)

I don't have any statistics but if I use the bug tracker as reference, a lot of
our users use JSZip in their browsers (and some, sadly, IE < 10). I've seen your
benchmarks on nodejs, that's why I've tested pako in different browsers.

Regarding the size, it matters on browsers. jQuery weights 35KB (minified /
gzipped) and is sometimes called "too big" or "bloated" (ok, most of the time
for mobile application). I'm ok with a 5KB increase, but @Stuk may have a
different point of view.

Regarding the speed, I was comparing what we have now (with zlib.js) and what
we could get with the default compression level of zlib. And you're right, my
"benchmarks" are flawed :) I've checked the output lengths and zlib.js default
compression is roughly the same as the level 4.

Regarding the huge files, I know that's not a good idea. Currently the JSZip
API is synchronous and sometimes people try to do insane things with it. My
goal here was to see how pako reacts in these "why on earth would you do that ?!"
situations. An asynchronous API for JSZip (to help with these situations) is in
my low-priority TODO list so it's cool to know that pako supports it !

I haven't used a lot pako's API : I just added pako.deflateRaw / pako.inflateRaw
and everything worked. Your API seems fine ! Without any DOM manipulation or
low level call, I don't think that a good nodejs API could be bad on the
browser...

@puzrin
Copy link
Contributor Author

puzrin commented Mar 15, 2014

Well, let's wait what @Stuk says. Thanks for spending time for browser tests.

I've just released pako 0.1.0, and it's now ok in npm & bower. No needs to fork master anymore.

@Stuk
Copy link
Owner

Stuk commented Mar 16, 2014

Thanks for the great work @puzrin, and those benchmarks look great @dduponchel.

I think the only place where 5kb has any real difference nowadays is on mobile (do we have any idea how commonly JSZip is used on mobile devices?) and really, if someone needs to reduce kb then removing or optimizing images is an easier place to make savings. In short, I think the gains in speed here outweigh the slight increase in size.

To me it looks like it would be fantastic to integrate pako!

@dduponchel
Copy link
Collaborator

After the fix in nodeca/pako#14, I've launched the following test :

  • deflate with pako (default options)
  • compare with the zlib output (default options)
  • inflate with pako
  • compare with the original

I've tested my whole filesystem, (about 400 000 files) without any error/difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants