zlib: Fix gzip member header/input buffer boundary issue #5883

addaleax · 2016-03-24T02:32:45Z

Pull Request check-list

Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
this change (including linting)?
Is the commit message formatted according to CONTRIBUTING.md?
If this change fixes a bug (or a performance problem), is a regression
test (or a benchmark) included?

Affected core subsystem(s)

zlib

Description of change

Make sure that, even if an inflate() call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output.

This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by gzip(1), too.)

The problem resolved by this is mentioned here, but definitely not as likely to occur in the wild, since it does not affect the decompression of gzip files containing only a single member.

/cc @kthelgason

bnoordhuis · 2016-03-24T09:00:11Z

Mostly LGTM but I think this is a semver-major change? Can you remove the (now unused) GZIP_ defines from src/node_zlib.cc?

addaleax · 2016-03-24T09:35:11Z

@bnoordhuis GZIP_MIN_HEADER_SIZE can still be removed, though, you’re right.

And regarding semver: Not sure – If so, then #5120 was semver-major in the first place. The current (both master and 5.9.x) behaviour is to throw an error upon encountering trailing garbage only if that starts with 1f 8b, which can be pretty unreliable I imagine (especially if those garbage bytes are actually random).

So, yes, possibly reverting back to the pre-5.9 behaviour of silently discarding trailing garbage in all cases is a good idea for 5.x, and implementing that should certainly be feasible. But I completely agree with your comment here, throwing an error should be the right behaviour.

bnoordhuis · 2016-03-24T09:42:55Z

Gray area, I suppose. Maybe one or two people from @nodejs/ctc can weigh in?

Change itself LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/2052/

addaleax · 2016-03-24T10:10:30Z

For future reference and to the best of my knowledge:

.gz file content	v5.8	v5.9 / master	after this	gunzip(1)
2 members	first member	both members	both members	both members
1 member + garbage	member	member	throws	member³
1 member + `00` + garbage¹	member	member	member	member³
1 member + `1f 8b` + garbage²	member	throws	throws	member³

¹ never actually inspected by any Node.js version, assumed to be null byte padding
² at least 10 bytes
³ plus a warning on stderr and non-zero exit code

And for the record, I don’t think trailing garbage is much of a real-world problem, but I’m not sure how relevant that fact is.

kthelgason · 2016-03-24T13:16:57Z

src/node_zlib.cc

@@ -43,7 +43,6 @@ enum node_zlib_mode {

 #define GZIP_HEADER_ID1 0x1f
 #define GZIP_HEADER_ID2 0x8b


These defines are unused now right?

Yep, but #5884 would start using them again, so I’d leave them in unless it gets decided that that should not get landed.

kthelgason · 2016-03-24T13:25:49Z

It's hard to debate the relevance of trailing garbage without any idea of prevalence in the wild. I'm on the fence regarding the behaviour of the 1 member + garbage case, I can see the argument for both sides.

Regarding semver-major, #5120 pretty much only changed in a breaking way the behaviour of the last case in your table, ~~and that was unintended, so it looked like a minor version bump to me at least~~. This PR is now being explicit about throwing an error where previously there was none, so I think a better case can be made for this landing in a major version.

EDIT: I was thinking of the case here #5863. @addaleax you're absolutely right regarding semver-major on this change.

addaleax · 2016-03-24T13:38:39Z

Yep, I can definitely see why one would count this as semver-major, and personally I don’t think I have a strong opionion on that. But keep in mind that this PR still mostly contains a bugfix: For the case that a new member’s header wraps around the input buffer boundary. So even if this does not land in some 5.x version, I’d still argue that a modified patch (which keeps the current trailing-garbage behaviour or restores the old one) should.

kthelgason · 2016-03-24T13:48:10Z

Yes, I agree 100%.

There is still the issue of the boundary between members falling exactly on the input buffer boundary, so we haven't really gotten rid of the problem, only reduced the likelihood of it's occurrence (significantly).

EDIT: turns out this is taken care of.

addaleax · 2016-03-24T13:51:05Z

test/parallel/test-zlib-from-concatenated-gzip.js

@@ -38,3 +41,26 @@ fs.createReadStream(pmmFileGz)
    assert.deepStrictEqual(Buffer.concat(pmmResultBuffers), pmmExpected,
      'result should match original random garbage');
  }));
+
+// test that the next gzip member can wrap around the input buffer boundary
+[0, 1, 2, 3, 4, defEncoded.length].forEach((offset) => {


@kthelgason I think testing for 0 should take care of the case where data members fall exactly on input boundaries:

The first unzip.write() call writes abcEncoded + 0 bytes from defEncoded, and the second unzip.end() call writes all of defEncoded. Or did you have something else in mind?

of course, you're absolutely right.

jasnell · 2016-03-24T14:42:52Z

I agree that it's technically a semver-major but this definitely looks like a grey area and I agree with the comments that an error is the right thing to do. I think I'd be ok with this landing as a semver-minor.

jasnell · 2016-03-24T14:43:00Z

The change LGTM

addaleax · 2016-03-25T01:04:35Z

Btw, this is what re-allowing trailing garbage would look like – wouldn’t be a big deal.

Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it. (Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by `gzip(1)`, too.) Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes).

addaleax · 2016-03-29T15:50:16Z

Rebased & squashed this into one commit.

bnoordhuis · 2016-03-31T11:25:24Z

Thanks Anna, landed in 54a5287.

Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it. (Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by `gzip(1)`, too.) Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). PR-URL: #5883 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

addaleax · 2016-03-31T11:33:09Z

@bnoordhuis Thanks! I would actually be more comfortable with having this marked as semver-major and backporting this bugfix in a way that keeps the current (v5.x) behaviour for trailing garbage. Would that be okay? (And if so, would creating a PR against the v5.x branch be the correct way to do that?)

bnoordhuis · 2016-03-31T11:43:09Z

Okay, I updated the labels. PRing against v5.x sounds good; v5.x-staging is the branch you want.

addaleax · 2016-03-31T11:50:42Z

Sorry to ask, but there is no v5.x-staging branch here?

bnoordhuis · 2016-03-31T12:15:38Z

Oh, that's right; I remember being baffled by that a while ago. It should be fine to target v5.x in that case.

@jasnell @thealphanerd What's the reason there isn't a v5.x-staging branch like there is for v0.10/v0.12 and v4?

MylesBorins · 2016-03-31T17:36:41Z

@bnoordhuis afaik the staging convention has been primarily used for LTS... but tbqh it would be useful for v5 as well and likely allow for quite a bit less pain when cutting releases.

Not 100% where an issue should be made for this though, ctc?

bnoordhuis · 2016-03-31T18:36:34Z

I don't think an issue is necessary. If other @nodejs/release people feel it's a good idea, just go ahead and do it. Maybe file an informational issue afterwards as a memo for contributors.

Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). This is a backport of nodejs#5883 and contains additional changes to make sure that the behaviour on encountering trailing garbage remains the same (namely to silently discard it if one full member has already been decompressed). #PR-URL: nodejs#5883 #Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> #Reviewed-By: James M Snell <jasnell@gmail.com>

Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). This is a backport of #5883 and contains additional changes to make sure that the behaviour on encountering trailing garbage remains the same (namely to silently discard it if one full member has already been decompressed). PR-URL: #5973 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

addaleax changed the title ~~Fix gzip member header/input buffer boundary issue~~ zlib: Fix gzip member header/input buffer boundary issue Mar 24, 2016

jasnell added the zlib Issues and PRs related to the zlib subsystem. label Mar 24, 2016

addaleax force-pushed the fix-gzip-member-buffer-boundary branch from 32c7d28 to c8755e7 Compare March 24, 2016 09:41

addaleax mentioned this pull request Mar 24, 2016

zlib: Allow zlib.gunzip features with zlib.unzip #5884

Closed

3 tasks

kthelgason reviewed Mar 24, 2016
View reviewed changes

addaleax reviewed Mar 24, 2016
View reviewed changes

addaleax force-pushed the fix-gzip-member-buffer-boundary branch from c8755e7 to e6af8b9 Compare March 29, 2016 15:49

bnoordhuis closed this Mar 31, 2016

bnoordhuis added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 31, 2016

addaleax deleted the fix-gzip-member-buffer-boundary branch March 31, 2016 11:33

bnoordhuis added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Mar 31, 2016

addaleax mentioned this pull request Mar 31, 2016

zlib: fix gzip member head/buffer boundary issue (backport of #5883) #5973

Closed

3 tasks

bnoordhuis mentioned this pull request Jun 30, 2016

ungzipping buffer with trailing garbage throws error #7502

Closed

springmeyer mentioned this pull request Sep 13, 2016

Node v6 breakage with gunzip mapbox/mapbox-file-sniff#46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zlib: Fix gzip member header/input buffer boundary issue #5883

zlib: Fix gzip member header/input buffer boundary issue #5883

addaleax commented Mar 24, 2016

bnoordhuis commented Mar 24, 2016

addaleax commented Mar 24, 2016

bnoordhuis commented Mar 24, 2016

addaleax commented Mar 24, 2016

kthelgason Mar 24, 2016

addaleax Mar 24, 2016

kthelgason commented Mar 24, 2016

addaleax commented Mar 24, 2016

kthelgason commented Mar 24, 2016

addaleax Mar 24, 2016

kthelgason Mar 24, 2016

jasnell commented Mar 24, 2016

jasnell commented Mar 24, 2016

addaleax commented Mar 25, 2016

addaleax commented Mar 29, 2016

bnoordhuis commented Mar 31, 2016

addaleax commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016

addaleax commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016

MylesBorins commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016

		@@ -43,7 +43,6 @@ enum node_zlib_mode {

		#define GZIP_HEADER_ID1 0x1f
		#define GZIP_HEADER_ID2 0x8b

zlib: Fix gzip member header/input buffer boundary issue #5883

zlib: Fix gzip member header/input buffer boundary issue #5883

Conversation

addaleax commented Mar 24, 2016

Pull Request check-list

Affected core subsystem(s)

Description of change

bnoordhuis commented Mar 24, 2016

addaleax commented Mar 24, 2016

bnoordhuis commented Mar 24, 2016

addaleax commented Mar 24, 2016

kthelgason Mar 24, 2016

Choose a reason for hiding this comment

addaleax Mar 24, 2016

Choose a reason for hiding this comment

kthelgason commented Mar 24, 2016

addaleax commented Mar 24, 2016

kthelgason commented Mar 24, 2016

addaleax Mar 24, 2016

Choose a reason for hiding this comment

kthelgason Mar 24, 2016

Choose a reason for hiding this comment

jasnell commented Mar 24, 2016

jasnell commented Mar 24, 2016

addaleax commented Mar 25, 2016

addaleax commented Mar 29, 2016

bnoordhuis commented Mar 31, 2016

addaleax commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016

addaleax commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016

MylesBorins commented Mar 31, 2016

bnoordhuis commented Mar 31, 2016