Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: compressor didn't chunkify big payload #3

Merged
merged 8 commits into from
Apr 26, 2022

Conversation

g11tech
Copy link

@g11tech g11tech commented Apr 14, 2022

As per snappy frame encoding format, an uncompressed chunk can't be greater than UNCOMPRESSED_CHUNK_SIZE = 65536.
This PR updates the same in the compressor and includes a test case for the same, as well as bumps the version.

@g11tech g11tech requested a review from a team as a code owner April 14, 2022 17:55
@@ -52,18 +60,25 @@ CompressStream.prototype._uncompressed = function (chunk) {

CompressStream.prototype._transform = function (chunk, enc, callback) {
var self = this
async function compressChunks() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why async function?

Copy link
Author

@g11tech g11tech Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API of transform is to call callback on the completion of transformation of the chunk, so don't want to holdup transform

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no await in this body, and compressChunks() is called without any handling. Errors will become unhandled rejections. Can you just make it

  function compressChunks() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the callback can error and cause unhandled rejection.

Maybe the implementation can look like:

new Promise(() => {
  ...
  callback(...);
  ...
}).catch((e) => logUnexpectedError(e);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, just console logged as nothing fancy available in lib

for (let startFrom = 0; startFrom < chunk.length; startFrom += UNCOMPRESSED_CHUNK_SIZE) {
const endAt = startFrom + Math.min(chunk.length - startFrom, UNCOMPRESSED_CHUNK_SIZE);
const bytesChunk = chunk.slice(startFrom, endAt);
const compressed = snappy.compressSync(bytesChunk)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch to the sync version? If we want to do that we should do it in another PR as study in depth the implications

Copy link
Author

@g11tech g11tech Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dapplion the async version where I chunk bytes from outside (like call compressStream.write(chunk)) which uses the original flow, compared to this where chunking happen in the transform using this way is 2X better slower

(i.e. compressSync based solution is coming out ahead):
image
(orig-snappy is the original with data chunked at compressStream.write level)

If I use async version inside with await Promise((resolve)=>... wrap to make sure the async returned data is written in a proper serial order, its 10-5% slower because of the overhead, if I don't care about the serial order in which chunks are written (which leads to incorrect stream), then async chunking version inside transform is 5% better.

I think for now this is our best bet 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good to me. Can you commit a benchmark tho? To inform us of the cost of compressing an uncompressing objects of 10Kb, 100Kb, 1000Kb

@mpetrunic
Copy link
Member

mpetrunic commented Apr 15, 2022

I suggest merging #4 first and reverting version bump here^^

@mpetrunic mpetrunic changed the title Update the compressor to chunkify a big payload fix: compressor didn't chunkify big payload Apr 15, 2022
@g11tech g11tech mentioned this pull request Apr 22, 2022
22 tasks
*
*/

CompressStream.prototype._transform = function(chunk, enc, callback) {
var self = this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need self anymore now

Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wemeetagain wemeetagain merged commit a658af0 into master Apr 26, 2022
@wemeetagain wemeetagain deleted the g11tech/chunkcompress branch April 26, 2022 09:46
This was referenced Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants