Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streams: add cork option to pipe #2020

Closed
wants to merge 1 commit into from

Conversation

calvinmetcalf
Copy link
Contributor

Adds an option to .pipe to cork it before each write and
then uncork it next tick, based on discussion at
nodejs/readable-stream#145

@chrisdickinson
Copy link
Contributor

Curious: why not just call cork on the destination stream to start with? Then the highwatermark handling will take care of subsequent writev flushing?

@trevnorris
Copy link
Contributor

cork knows about the high water mark? There's something I must be misunderstanding.

@mscdex mscdex added the stream Issues and PRs related to the stream subsystem. label Jun 19, 2015
@@ -333,6 +333,8 @@ readable.isPaused() // === false
* `destination` {[Writable][] Stream} The destination for writing data
* `options` {Object} Pipe options
* `end` {Boolean} End the writer when the reader ends. Default = `true`
* `cork` {Boolean} Before each write cork the stream and then uncork it on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should probably be a comma after Before each write

@calvinmetcalf
Copy link
Contributor Author

@chrisdickinson because then you have to wait until you hit the highwatermark before you actually write anything, see the comment by @indutny in the readable stream issue

debug('corking');
corked = true;
dest.cork();
process.nextTick(function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if pulling this anonymous function out of maybeCork() would help performance-wise?

@chrisdickinson
Copy link
Contributor

OK, I was wrong – cork and uncork have no idea about highwatermark. That said, I'm not sure I'm fully on board with this change (though I may be misunderstanding it.)

So, to check my assumptions about how streams work (and in particular, how net.Streams work):

  1. First tick of the event loop.
    1. We write once to the net.Socket. It immediately tries to flush to the underlying TCPWrap.
    2. We write again to the net.Socket, N times. These all buffer, while the first write finishes.
  2. First write completes.
    1. We flush all buffered writes to the TCPWrap using _writev.
    2. All writes that happen while this large buffer flushes are themselves buffered, until HWM.

Right now streams will be flushing chunks without using cork, uncork. The only write that is guaranteed to escape that is the first write. The size of the outgoing packet is determined by the amount of data that the source stream can generate during packet flushes.

Adds an option to .pipe to cork it before each write and
then uncork it next tick, based on discussion at
nodejs/readable-stream#145
@chrisdickinson
Copy link
Contributor

Things that could cause my assumptions to be incorrect:

  1. nodelay TCPWrap writes could be synchronous (I don't know if they are for sure, but some initial research seems to point to "they're async.")
  2. the importance of batching writes for large packets is higher than I assume (that is to say, we want to collect more than "amount of data the source can generate during a flush" bytes before flushing.)

These both seem like TCP-specific concerns – it might be better to solve them at the net.Socket level than at the stream.Writable level.

@calvinmetcalf
Copy link
Contributor Author

the importance of batching writes for large packets is higher than I assume (that is to say, we want to collect more than "amount of data the source can generate during a flush" bytes before flushing.)

This is the more general benefit that could apply to other streams that want to strike a balance between per write overhead and latency

@trevnorris
Copy link
Contributor

Just for reference, this is basically the same thing the http module does today.

@trevnorris
Copy link
Contributor

Another key component in this is the interaction with uv_try_write. You want to immediately write out as much as the kernel can handle then queue the remaining. It's not uncommon that all writes can be done immediately. This affects all uv_stream_t instances.

@calvinmetcalf
Copy link
Contributor Author

@mscdex updated based on your suggestions

@indutny
Copy link
Member

indutny commented Jun 19, 2015

@trevnorris still it is faster to do just one writev call than multiple ones.

@trevnorris
Copy link
Contributor

@indutny internally doesn't it automatically write out as much as possible using uv_try_write before setting up the WriteReq?

@indutny
Copy link
Member

indutny commented Jun 19, 2015

Yes, but it will pass multiple buffers as a single input with writev.

@trevnorris
Copy link
Contributor

Sure, but uv_try_write only takes one at a time. That's what I was trying to get at above. Thought it may be a performance advantage to write immediately until uv_try_write fails, then queue up the remaining for writev. Simply because between the first set of running uv_try_write the kernel may have flushed some of the data and could accept more when writev ran.

But the timing could also be so minimal that it doesn't really matter.

@indutny
Copy link
Member

indutny commented Jun 19, 2015

Not really, it takes multiple:

UV_EXTERN int uv_try_write(uv_stream_t* handle,
                           const uv_buf_t bufs[],
                           unsigned int nbufs);

@indutny
Copy link
Member

indutny commented Jun 19, 2015

Though, your comments are quite correct. I am not suggesting this should be a default behavior in any way. But for my use case it would be beneficial to introduce this option, otherwise I will need to concatenate the buffers manually in memory.

@trevnorris
Copy link
Contributor

Doh. Memory failure. Thanks for correcting me.

My comment was more just an observation I realized while looking over this PR. Definitely not something I think should be introduced in this PR. :-)

@chrisdickinson
Copy link
Contributor

@calvinmetcalf:

This is the more general benefit that could apply to other streams that want to strike a balance between per write overhead and latency

We're talking about bringing back lowWatermark?

@indutny:

Based on this comment, it seems like you should be seeing at least one writev of size N>1. Is the problem that:

  • you are not seeing any writev of size N>1 happen, or
  • the writev calls you are seeing are all of N<desired-size, or
  • the initial writev call is of size N==1?

@indutny
Copy link
Member

indutny commented Jun 21, 2015

@chrisdickinson they can't happen because I am piping to the socket, not writing to it myself. So every write results in separate write() syscall, unless some of them will fail to complete immediately and will result in buffering (which is rare in the most setups I am using).

@ronkorving
Copy link
Contributor

I have a use-case where I'm piping from a _transform to a writable, and would benefit from the same solution. For me however, nextTick would be overkill (not sure what the cost of a nexttick is tbh), as _transform already uses a callback to denote the end of a batch of writes. Perhaps the transform use case could be optimized?

@@ -515,9 +518,26 @@ Readable.prototype.pipe = function(dest, pipeOpts) {
ondrain();
}

function maybeCork() {
if (!autoCork || corked) {
debug('already corked');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not technically correct in the case of autoCork being false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True should be more like, 'no need to cork'

@ronkorving
Copy link
Contributor

I just submitted #2167 which I think might really benefit from this.

@@ -467,6 +467,9 @@ Readable.prototype.pipe = function(dest, pipeOpts) {
dest !== process.stdout &&
dest !== process.stderr;

var autoCork = pipeOpts && pipeOpts.cork && (typeof dest.cork === 'function');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a time when dest.cork isn't a function? Won't it error anyways if it isn't a writable stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old streams?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense! Thanks!

@brendanashworth brendanashworth added the semver-minor PRs that contain new features and should be released in the next minor version. label Sep 1, 2015
@jasnell
Copy link
Member

jasnell commented Nov 16, 2015

@calvinmetcalf ... ping ... is this still something you'd like to pursue?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Nov 16, 2015
@calvinmetcalf
Copy link
Contributor Author

sure I can rebase

@calvinmetcalf
Copy link
Contributor Author

closing this as I'm not so sure we need this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor PRs that contain new features and should be released in the next minor version. stalled Issues and PRs that are stalled. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants