miniz fixes #227

SinisterRectus · 2020-02-27T19:45:10Z

I finally got a chance to use the miniz zlib api (#163 and #165), but I ran into some issues.

Byte streams were not fully processed in inflator:inflate/deflator:deflate when they were larger than the output buffer. This was missed by tests because the they never reached the buffer limit. With inspiration from the lua-zlib bindings, I added a loop to handle the overflow, and changed the test to parse a larger stream. I also removed the "finish" flush mode from the inflator test, which lua-zlib also seems to omit. We may have to see whether we're properly implementing that.
miniz.compress was using the wrong stack index for the compression level selection. I factored this out into a helper function and changed the index from 1 to 2 and added a test.
miniz.uncompress attempted to use an output buffer of the same size as the input buffer. I changed this to allow an optional setting, with input * 10 as the default and added a test.

src/lminiz.c

zhaozg · 2020-02-28T01:37:31Z

I left some comment, I recommend to use mz_*Bound api to handle output length.
https://github.com/richgel999/miniz/blob/e71344bdd7e8e6c18fb67721db7a634ffa202f9e/miniz.h#L348-L349
https://github.com/richgel999/miniz/blob/e71344bdd7e8e6c18fb67721db7a634ffa202f9e/miniz.h#L356-L357

SinisterRectus · 2020-02-28T02:51:09Z

I'm not sure what the correct solution is for choosing the correct inflate buffer size. In lua-zlib, they create a one-time-use inflator that grows the buffer in the same way that inflator:inflate does. It doesn't make sense to me to implement this twice, especially if miniz provides a shortcut.

SinisterRectus · 2020-02-28T03:13:16Z

If we do allow an input value, we need upper and lower bounds.

local data = miniz.uncompress(compressed, 2^32) -- overflow

src/lminiz.c

zhaozg · 2020-02-29T03:41:32Z

https://github.com/richgel999/miniz/blob/3a00ea8b73c57e3e4c636bd7851f230b6968ac26/miniz.c#L309-L314

EDIT:

make reverse of mz_deflateBound

mz_ulong mz_inflateBound(mz_streamp pStream, mz_ulong source_len)
{
  mz_ulong da = source_len * 100 / 110;
  mz_ulong db = (31 * 1024 * source_len  -  31 * 1024 * 5 )/(31 * 1024+5);
  return MZ_MAX(128 + da, 128+db);
}

SinisterRectus · 2020-02-29T16:38:45Z

I don't think that will work. According to "Maximum Expansion Factor" in https://zlib.net/zlib_tech.html, deflateBound is an estimate of how much space would be needed for compressed data plus any overhead. For data that is difficult to compress, the output can easily be larger than the input.

In the absolute worst case of a single-byte input stream, the overhead therefore amounts to 1100% (eleven bytes of overhead, one byte of actual data).

Indeed, miniz's implementation always produces a number that is larger than the input size, even though the output is expected to be smaller due to compression. (I'm not sure why miniz calls this conservative.) If we invert that function, the output buffer would will always be smaller than the input buffer, which would fail during typical use.

local function deflateBound(x)
  local a = 128 + (x * 110) / 100
  local b = 128 + x + ((x / (31 * 1024)) + 1) * 5
  return math.floor(math.max(a, b))
end

for i = 0, 16 do
  local in_len = 2^i
  local out_len = deflateBound(in_len)
  print(i, in_len, out_len, out_len / in_len)
end

0       1       134     134
1       2       135     67.5
2       4       137     34.25
3       8       141     17.625
4       16      149     9.3125
5       32      165     5.15625
6       64      198     3.09375
7       128     268     2.09375
8       256     409     1.59765625
9       512     691     1.349609375
10      1024    1254    1.224609375
11      2048    2380    1.162109375
12      4096    4633    1.131103515625
13      8192    9139    1.1156005859375
14      16384   18150   1.1077880859375
15      32768   36172   1.1038818359375
16      65536   72217   1.1019439697266

The "Maximum Compression Factor" explains what we are looking for. There is a theoretical compression ratio as high as 1000:1 with typical ratios of 2:1 to 5:1.

So, I think that an expanding buffer is a good idea. Python does this.

bufsize is the initial size of the buffer used to hold decompressed data. If more space is required, the buffer size will be increased as needed, so you don’t have to get this value exactly right; tuning it will only save a few calls to malloc().

We just need to rely on something other than MZ_BUF_ERROR to know when to stop expanding. Maybe impose an upper limit somewhere between 2^16 and 2^32.

SinisterRectus · 2020-02-29T21:06:00Z

See the recent commit to lmz_uncompress.

miniz fixes

c9e2435

squeek502 requested a review from zhaozg February 27, 2020 23:03

zhaozg reviewed Feb 28, 2020

View reviewed changes

src/lminiz.c Outdated Show resolved Hide resolved

rphillips reviewed Feb 28, 2020

View reviewed changes

src/lminiz.c Show resolved Hide resolved

Use a dynamic buffer size in miniz uncompress

53247a3

zhaozg merged commit a413ad0 into master Mar 1, 2020

SinisterRectus deleted the miniz-fix branch July 10, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miniz fixes #227

miniz fixes #227

SinisterRectus commented Feb 27, 2020

zhaozg commented Feb 28, 2020 •

edited

Loading

SinisterRectus commented Feb 28, 2020

SinisterRectus commented Feb 28, 2020 •

edited

Loading

zhaozg commented Feb 29, 2020 •

edited

Loading

SinisterRectus commented Feb 29, 2020 •

edited

Loading

SinisterRectus commented Feb 29, 2020

miniz fixes #227

miniz fixes #227

Conversation

SinisterRectus commented Feb 27, 2020

zhaozg commented Feb 28, 2020 • edited Loading

SinisterRectus commented Feb 28, 2020

SinisterRectus commented Feb 28, 2020 • edited Loading

zhaozg commented Feb 29, 2020 • edited Loading

SinisterRectus commented Feb 29, 2020 • edited Loading

SinisterRectus commented Feb 29, 2020

zhaozg commented Feb 28, 2020 •

edited

Loading

SinisterRectus commented Feb 28, 2020 •

edited

Loading

zhaozg commented Feb 29, 2020 •

edited

Loading

SinisterRectus commented Feb 29, 2020 •

edited

Loading