Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose Z_STREAM_END #26624

Closed
wants to merge 1 commit into from
Closed

expose Z_STREAM_END #26624

wants to merge 1 commit into from

Conversation

kingces95
Copy link

@kingces95 kingces95 commented Mar 13, 2019

Zlib streams self terminate and need a way to report that case.

I've use the same convention as is used to report the number of decompressed bytes (bytesWritten). The number of decompressed bytes cannot be reported via a first class stream abstraction. Similarly, stream self termination cannot be reported via a first class stream abstraction. So I figured it made sense to use the same mechanism to report both out-of-stream-abstraction conditions. Also, kinda makes sense: The state of the write of a chunk of data to inflate includes not only how much was consumed during inflation but also whether the EOF was encountered.

This feature is necessary for streaming decompression of zip files (say, during download). A zip file is partitioned into headers followed by deflated streams. The headers do not necessarily have to contain the length of the stream. So the zlib inflate stream needs to report the deflated stream's EOF up to the unzip algorithm. That way the unzip algorithm knows when to stop writing bytes to the inflate stream (and then do the accounting to know how many bytes were tossed by the inflate stream and backup to that position in the download stream to look for the next header).

Will do checklist pending discussion on #26603.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. zlib Issues and PRs related to the zlib subsystem. labels Mar 13, 2019
@kingces95
Copy link
Author

Here is my toy demonstration of what I need to do streaming zip file decompression. I compresses a stream of 100 characters. I then consume that deflated buffer (write them into a deflateRaw stream) in chunks to simulate downloading them piecemeal. Then I also write 6 bytes of junk into the deflateRaw stream to simulate zlib tossing bytes written after the deflate stream has terminated. I detect the deflate stream has terminated in 'data' and do the accounting necessary to backup the downloading stream (if zlib can do that accounting for the user, bonus!).

var zlib = require('zlib');
var assert = require('assert');
var codes = zlib.codes;

const input = Buffer.from('0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789');

var log = console.log.bind(console);

zlib.deflateRaw(input, (err, deflatedBuffer) => {
  assert(!err);

  var bytesRead = 0;
  var buffers = [];

  var stream = zlib.createInflateRaw();
  var position = 0;

  stream.on('error', e => {
    log('error', e);
    stream.end();
  })

  stream.on('data', function(chunk) {
    buffers.push(chunk);
    bytesRead += chunk.length;
    log('decompressed:', 'written', stream.bytesWritten, 'read', bytesRead, 'result', stream.result);
    if (stream.result == codes.Z_STREAM_END) {
      log('discarded:', position - stream.bytesWritten);
      stream.end();
    }
  })

  stream.on('finish', function() {
    var result = Buffer.concat(buffers, bytesRead);
    this.close();
    log('result:', result.toString());
    log('chunks:', buffers.length);
    log('bytesWritten:', stream.bytesWritten);
    log('numberRead:', bytesRead);
  })

  log('download: started')
  stream.write(deflatedBuffer.slice(0, 5)); position += 5;
  stream.write(deflatedBuffer.slice(5, 10)); position += 5;
  stream.write(deflatedBuffer.slice(10)); position += deflatedBuffer.length - 10;
  stream.write(Buffer.from([1,2,3,4,5,6])); position += 6;

  log('download: delayed')
  setTimeout(() => {
    log('download: resumed')
    log(stream.write(Buffer.from([1,2,3,4,5,6]))); // discarded
    log('download: finished')
    stream.end();
  }, 1000);
});

Result:

download: started
download: delayed
decompressed: written 5 read 4 result 0
decompressed: written 10 read 9 result 0
decompressed: written 15 read 100 result 1
discarded: 6
download: resumed
error
NodeError: write after end
false
download: finished
result: 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

// This applies to streams where we don't check data past the end of
// what was consumed; that is, everything except Gunzip/Unzip.
self.push(null);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this is a breaking change, the test added in #26363 should fail (once the linter issue has been fixed here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't realize this had already gone out the door. Nor did I realize that I should be trapping for the end event and not the finish event!

Thanks for the code review!

@kingces95
Copy link
Author

Done in #26363

@kingces95 kingces95 closed this Mar 13, 2019
@kingces95 kingces95 deleted the gh26603 branch March 13, 2019 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants