Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miniz fixes #227

Merged
merged 2 commits into from
Mar 1, 2020
Merged

miniz fixes #227

merged 2 commits into from
Mar 1, 2020

Conversation

SinisterRectus
Copy link
Member

I finally got a chance to use the miniz zlib api (#163 and #165), but I ran into some issues.

  1. Byte streams were not fully processed in inflator:inflate/deflator:deflate when they were larger than the output buffer. This was missed by tests because the they never reached the buffer limit. With inspiration from the lua-zlib bindings, I added a loop to handle the overflow, and changed the test to parse a larger stream. I also removed the "finish" flush mode from the inflator test, which lua-zlib also seems to omit. We may have to see whether we're properly implementing that.

  2. miniz.compress was using the wrong stack index for the compression level selection. I factored this out into a helper function and changed the index from 1 to 2 and added a test.

  3. miniz.uncompress attempted to use an output buffer of the same size as the input buffer. I changed this to allow an optional setting, with input * 10 as the default and added a test.

src/lminiz.c Outdated Show resolved Hide resolved
@zhaozg
Copy link
Member

zhaozg commented Feb 28, 2020

@SinisterRectus
Copy link
Member Author

I'm not sure what the correct solution is for choosing the correct inflate buffer size. In lua-zlib, they create a one-time-use inflator that grows the buffer in the same way that inflator:inflate does. It doesn't make sense to me to implement this twice, especially if miniz provides a shortcut.

@SinisterRectus
Copy link
Member Author

SinisterRectus commented Feb 28, 2020

If we do allow an input value, we need upper and lower bounds.

local data = miniz.uncompress(compressed, 2^32) -- overflow

@zhaozg
Copy link
Member

zhaozg commented Feb 29, 2020

https://github.com/richgel999/miniz/blob/3a00ea8b73c57e3e4c636bd7851f230b6968ac26/miniz.c#L309-L314

EDIT:

make reverse of mz_deflateBound

mz_ulong mz_inflateBound(mz_streamp pStream, mz_ulong source_len)
{
  mz_ulong da = source_len * 100 / 110;
  mz_ulong db = (31 * 1024 * source_len  -  31 * 1024 * 5 )/(31 * 1024+5);
  return MZ_MAX(128 + da, 128+db);
}

@SinisterRectus
Copy link
Member Author

SinisterRectus commented Feb 29, 2020

I don't think that will work. According to "Maximum Expansion Factor" in https://zlib.net/zlib_tech.html, deflateBound is an estimate of how much space would be needed for compressed data plus any overhead. For data that is difficult to compress, the output can easily be larger than the input.

In the absolute worst case of a single-byte input stream, the overhead therefore amounts to 1100% (eleven bytes of overhead, one byte of actual data).

Indeed, miniz's implementation always produces a number that is larger than the input size, even though the output is expected to be smaller due to compression. (I'm not sure why miniz calls this conservative.) If we invert that function, the output buffer would will always be smaller than the input buffer, which would fail during typical use.

local function deflateBound(x)
  local a = 128 + (x * 110) / 100
  local b = 128 + x + ((x / (31 * 1024)) + 1) * 5
  return math.floor(math.max(a, b))
end

for i = 0, 16 do
  local in_len = 2^i
  local out_len = deflateBound(in_len)
  print(i, in_len, out_len, out_len / in_len)
end
0       1       134     134
1       2       135     67.5
2       4       137     34.25
3       8       141     17.625
4       16      149     9.3125
5       32      165     5.15625
6       64      198     3.09375
7       128     268     2.09375
8       256     409     1.59765625
9       512     691     1.349609375
10      1024    1254    1.224609375
11      2048    2380    1.162109375
12      4096    4633    1.131103515625
13      8192    9139    1.1156005859375
14      16384   18150   1.1077880859375
15      32768   36172   1.1038818359375
16      65536   72217   1.1019439697266

The "Maximum Compression Factor" explains what we are looking for. There is a theoretical compression ratio as high as 1000:1 with typical ratios of 2:1 to 5:1.

So, I think that an expanding buffer is a good idea. Python does this.

bufsize is the initial size of the buffer used to hold decompressed data. If more space is required, the buffer size will be increased as needed, so you don’t have to get this value exactly right; tuning it will only save a few calls to malloc().

We just need to rely on something other than MZ_BUF_ERROR to know when to stop expanding. Maybe impose an upper limit somewhere between 2^16 and 2^32.

@SinisterRectus
Copy link
Member Author

See the recent commit to lmz_uncompress.

@zhaozg zhaozg merged commit a413ad0 into master Mar 1, 2020
@SinisterRectus SinisterRectus deleted the miniz-fix branch July 10, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants