Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batched TransformStream transform()? #574

Open
isonmad opened this issue Oct 27, 2016 · 1 comment
Open

batched TransformStream transform()? #574

isonmad opened this issue Oct 27, 2016 · 1 comment

Comments

@isonmad
Copy link
Contributor

isonmad commented Oct 27, 2016

#551 (comment) proposing some kind of lowWaterMark got me thinking. Would it be worth it to change transformer.transform to be passed an array of chunks, instead of a single chunk?

By default with only one element, but with options that would enable

  • specifying a maximum total size of the batched chunks (the maximum desired value of writableStrategy.size(chunks[0])+writableStrategy.size(chunks[1])+... )
  • specifying a minimum total size of the batched chunks, such that transform() calls are delayed until sufficient chunks have been enqueued in the writable end (this would, for obvious reasons, be required to be lower than highWaterMark)

Could putting that kind of logic into TransformStream instead of having the transformer implement its own, third, internal queue and batching logic be useful?

@ricea
Copy link
Collaborator

ricea commented Oct 28, 2016

In the case of fixed-block-size transforms, like encryption, I don't think it helps that much. Even if we provide a minimum total size, the transformer still has to deal with the case where they receive data that is not an exact multiple of the block size. So what they'll end up doing is storing an array of chunks from the last call to transform() that haven't been used yet (or keep a copy of the data in their own buffer). Once you've got that data structure it's trivial to add new chunks of data to it if you don't have enough to process a block yet. So the minimal total size ends up being not that helpful.

A way to say "feed me exactly 1024 bytes at a time, no more, no less" would solve this problem but seems a bit exotic. What I mean by "exotic" is that it solves a specific problem for one particular use-case but is useless for everyone else.

I think it would be natural to look at this again when #27 ("the writev problem") is resolved. If we have batched writes then the case for batched transforms become a lot more compelling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants