Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: avoid drain for sync streams #32887

Closed
wants to merge 2 commits into from

Conversation

ronag
Copy link
Member

@ronag ronag commented Apr 16, 2020

Previously a sync writable receiving chunks
larger than highwatermark would unecessarily
ping pong needDrain.

300% improvement for sync streams when chunks are bigger than HWM.

 streams/writable-manywrites.js len=1024 callback='no' writev='no' sync='no' n=2000000                     1.84 %       ±3.27%  ±4.39%  ±5.79%
 streams/writable-manywrites.js len=1024 callback='no' writev='no' sync='yes' n=2000000                    2.00 %       ±2.02%  ±2.70%  ±3.54%
 streams/writable-manywrites.js len=1024 callback='no' writev='yes' sync='no' n=2000000                    1.68 %       ±2.12%  ±2.83%  ±3.68%
 streams/writable-manywrites.js len=1024 callback='no' writev='yes' sync='yes' n=2000000                   1.31 %       ±2.44%  ±3.25%  ±4.23%
 streams/writable-manywrites.js len=1024 callback='yes' writev='no' sync='no' n=2000000                    0.19 %       ±1.77%  ±2.36%  ±3.08%
 streams/writable-manywrites.js len=1024 callback='yes' writev='no' sync='yes' n=2000000            *      2.60 %       ±2.46%  ±3.28%  ±4.28%
 streams/writable-manywrites.js len=1024 callback='yes' writev='yes' sync='no' n=2000000                   1.32 %       ±2.37%  ±3.16%  ±4.14%
 streams/writable-manywrites.js len=1024 callback='yes' writev='yes' sync='yes' n=2000000                 -1.15 %       ±2.59%  ±3.46%  ±4.53%
 streams/writable-manywrites.js len=32768 callback='no' writev='no' sync='no' n=2000000                    0.42 %       ±2.27%  ±3.04%  ±4.00%
 streams/writable-manywrites.js len=32768 callback='no' writev='no' sync='yes' n=2000000          ***    308.72 %       ±7.84% ±10.53% ±13.91%
 streams/writable-manywrites.js len=32768 callback='no' writev='yes' sync='no' n=2000000                   0.22 %       ±1.78%  ±2.37%  ±3.10%
 streams/writable-manywrites.js len=32768 callback='no' writev='yes' sync='yes' n=2000000         ***    274.22 %       ±6.83%  ±9.20% ±12.20%
 streams/writable-manywrites.js len=32768 callback='yes' writev='no' sync='no' n=2000000                   1.56 %       ±2.40%  ±3.23%  ±4.27%
 streams/writable-manywrites.js len=32768 callback='yes' writev='no' sync='yes' n=2000000         ***    323.42 %       ±5.95%  ±8.00% ±10.60%
 streams/writable-manywrites.js len=32768 callback='yes' writev='yes' sync='no' n=2000000                 -0.42 %       ±1.90%  ±2.55%  ±3.35%
 streams/writable-manywrites.js len=32768 callback='yes' writev='yes' sync='yes' n=2000000        ***    287.75 %       ±7.60% ±10.21% ±13.50%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@ronag ronag force-pushed the stream-sync-drain branch from 85dc602 to 6062917 Compare April 16, 2020 19:58
@nodejs-github-bot nodejs-github-bot added the stream Issues and PRs related to the stream subsystem. label Apr 16, 2020
Previously a sync writable receiving chunks
larger than highwatermark would unecessarily
ping pong needDrain.
@ronag ronag force-pushed the stream-sync-drain branch from 6062917 to abc1d0b Compare April 16, 2020 19:59
@ronag
Copy link
Member Author

ronag commented Apr 16, 2020

Not sure if this should be a semver-major.

@ronag ronag requested a review from mscdex April 16, 2020 21:18
@nodejs-github-bot
Copy link
Collaborator

@ronag ronag requested a review from lpinca April 17, 2020 07:16
@lpinca
Copy link
Member

lpinca commented Apr 17, 2020

This looks great if it does not cause issues in user-land (it will almost certainly break ws tests).

@ronag
Copy link
Member Author

ronag commented Apr 17, 2020

This looks great if it does not cause issues in user-land (it will almost certainly break ws tests).

I'll run a CITGM.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Apr 17, 2020

@ronag ronag requested a review from mcollina April 18, 2020 16:28
@ronag
Copy link
Member Author

ronag commented Apr 18, 2020

CITGM looks good apart from ws. Sorry @lpinca. I'm fine with skipping this PR if it's a big effort from you part.

@lpinca
Copy link
Member

lpinca commented Apr 18, 2020

It should be just a matter of fixing those two failing tests (not sure how), core functionality should not be broken.

@lpinca
Copy link
Member

lpinca commented Apr 18, 2020

Perhaps wait until #32780 lands so I don't have to fix them twice.

@ronag ronag added blocked PRs that are blocked by other issues or PRs. and removed blocked PRs that are blocked by other issues or PRs. labels Apr 18, 2020
@ronag
Copy link
Member Author

ronag commented Apr 22, 2020

@nodejs/streams

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// We must ensure that previous needDrain will not be reset to false.
if (!ret)
state.needDrain = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a few comments about this change here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what kind of comment?

@mcollina
Copy link
Member

I tagged this as "dont-land" on everything but v14.x.

lpinca added a commit to websockets/ws that referenced this pull request Apr 25, 2020
Do not rely on the `'drain'` event for synchronous writes.

Refs: nodejs/node#32887
@lpinca
Copy link
Member

lpinca commented Apr 25, 2020

I've fixed the failing tests: websockets/ws@18d773d.

@nodejs-github-bot
Copy link
Collaborator

ronag added a commit that referenced this pull request Apr 25, 2020
Previously a sync writable receiving chunks
larger than highwatermark would unecessarily
ping pong needDrain.

PR-URL: #32887
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@ronag
Copy link
Member Author

ronag commented Apr 25, 2020

Landed in 003fb53

@ronag ronag closed this Apr 25, 2020
BethGriggs pushed a commit that referenced this pull request Apr 27, 2020
Previously a sync writable receiving chunks
larger than highwatermark would unecessarily
ping pong needDrain.

PR-URL: #32887
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@BethGriggs BethGriggs mentioned this pull request Apr 27, 2020
BethGriggs pushed a commit that referenced this pull request Apr 28, 2020
Previously a sync writable receiving chunks
larger than highwatermark would unecessarily
ping pong needDrain.

PR-URL: #32887
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@mscdex
Copy link
Contributor

mscdex commented Aug 9, 2020

@ronag This PR (found after bisecting on v14.x) seems to be causing issues for ssh2/ssh2-streams when large amounts of data are being passed through a stream. The same regression still exists on master as of this writing. Can we at least revert this on v14.x, especially if there is no guidance on how to deal with the breakage in a backwards-compatible manner?

@mcollina
Copy link
Member

@mscdex +1 in reverting this. I would love to have a regression test added (even in ssh2 and we can add it to citgm) so that we do not regress again.

@ronag
Copy link
Member Author

ronag commented Aug 10, 2020

I'm ok with revert + a comment in the code. Just a note here, reverting this might have performance impact e.g. when piping from a file stream (which has a larger highwater mark) to a sync transform stream (which has a smaller highwatermark) e.g. for hashing.

It's a bit strange to me that this would cause breakage. Is ssh2 doing something funky with streams?

@p-j
Copy link

p-j commented Aug 10, 2020

Hi,
If it can help, as part of troubleshooting a related issue @theophilusx made this script that will hang forever if the passed data is large enough as discussed here.

@mcollina
Copy link
Member

None of those examples are self-contained :/, they need an active ssh server to run.

@mscdex
Copy link
Contributor

mscdex commented Aug 11, 2020

I've stripped down the relevant ssh2 internals so that it is a completely standalone test that exhibits the problem. The code here doesn't exactly represent what is happening internally in most cases (mostly referring to the write happening before piping, that's done here to simulate backpressure and buffered data), but it should sufficiently demonstrate the same issue. Also it's possible the test code could be simplified further if someone wanted to have a go at it.

Source code
'use strict';

const { Duplex, Transform } = require('stream');
const { connect, createServer } = require('net');
const { inherits } = require('util');

function Channel(protoStream, socket) {
  const streamOpts = {
    highWaterMark: 2 * 1024 * 1024,
    allowHalfOpen: false
  };

  Duplex.call(this, streamOpts);

  socket.on('drain', () => {
    console.log('ondrain()', 'waitClientDrain', this._waitSocketDrain);
    if (this._waitSocketDrain) {
      this._waitSocketDrain = false;
      if (this._chunk)
        this._write(this._chunk, null, this._chunkcb);
      else if (this._chunkcb)
        this._chunkcb();
    }
  });

  this._protoStream = protoStream;

  // outgoing data
  this._waitSocketDrain = false;

  this._chunk = undefined;
  this._chunkcb = undefined;
}
inherits(Channel, Duplex);

Channel.prototype._read = function(n) {};

Channel.prototype._write = function(data, encoding, cb) {
  const protoStream = this._protoStream;
  const packetSize = 64 * 1024;
  const len = data.length;
  let p = 0;

  while (len - p > 0) {
    let sliceLen = len - p;
    if (sliceLen > packetSize)
      sliceLen = packetSize;

    const slice = data.slice(p, p + sliceLen);
    const ret = protoStream.sendData(slice);
    console.log(`Channel._write() ret = ${ret} after writing ${slice.length} byte(s)`);

    p += sliceLen;

    if (!ret) {
      this._waitSocketDrain = true;
      this._chunk = undefined;
      this._chunkcb = cb;
      break;
    }
  }

  console.log(`Channel._write() outside of loop; p=${p} len=${len} waitSocketDrain=${this._waitSocketDrain}`);

  if (len - p > 0) {
    if (p > 0) {
      // partial
      let buf = Buffer.allocUnsafe(len - p);
      data.copy(buf, 0, p);
      this._chunk = buf;
    } else {
      this._chunk = data;
    }
    this._chunkcb = cb;
    return;
  }

  if (!this._waitSocketDrain)
    cb();
};

function ProtocolStream() {
  Transform.call(this, { highWaterMark: 32 * 1024 });
}
inherits(ProtocolStream, Transform);

ProtocolStream.prototype.sendData = function(data) {
  return this.push(data);
};

createServer(function(socket) {
  this.close();
}).listen(0, '127.0.0.1', function() {
  const { address, port } = this.address();
  connect(port, address, function() {
    console.log('Client connected');
    const protoStream = new ProtocolStream();
    const channel = new Channel(protoStream, this);
    channel.on('finish', () => {
      console.log('============ Channel finish');
      this.destroy();
    });
    channel.write(Buffer.alloc(128 * 1024));
    protoStream.pipe(this).pipe(protoStream);
    channel.end(Buffer.alloc(128 * 1024));
  });
});

Executing the test on node v10.19.0 results in the following output:

$ node-v10.19.0 stream-issue.js
Client connected
Channel._write() ret = false after writing 65536 byte(s)
Channel._write() outside of loop; p=65536 len=131072 waitSocketDrain=true
ondrain() waitClientDrain true
Channel._write() ret = false after writing 65536 byte(s)
Channel._write() outside of loop; p=65536 len=65536 waitSocketDrain=true
ondrain() waitClientDrain true
Channel._write() ret = false after writing 65536 byte(s)
Channel._write() outside of loop; p=65536 len=131072 waitSocketDrain=true
ondrain() waitClientDrain true
Channel._write() ret = false after writing 65536 byte(s)
Channel._write() outside of loop; p=65536 len=65536 waitSocketDrain=true
ondrain() waitClientDrain true
============ Channel finish
$

Executing the test on node v14.1.0 or later results in the following output:

$ node-v14.7.0 stream-issue.js
Client connected
Channel._write() ret = false after writing 65536 byte(s)
Channel._write() outside of loop; p=65536 len=131072 waitSocketDrain=true

(... and the process never exits)

@ronag
Copy link
Member Author

ronag commented Aug 11, 2020

Looks like it ends up making an incorrect assumption regarding whether or not 'drain' will be emitted. Maybe something with _write being called in the drain handler which can set _waitSocketDrain to true without the stream actually needing a drain.

I think this is weird/incorrect/bug usage and node streams are behaving correctly. However, I guess it is is breaking change so if no one feels strongly about the performance loss I think we just revert it.

@mcollina
Copy link
Member

@mscdex @ronag how hard would it be to fix on the ssh2 side? What puzzles me is that all of ssh2 tests were passing, so it would have been extremely hard to spot this.

Given that we were on the fence on this being semver-major we can revert on v14 and keep it on master/v15.

@theophilusx
Copy link

theophilusx commented Aug 12, 2020 via email

@schmod
Copy link

schmod commented Oct 6, 2020

Not to do the "+1" thing here, but we've seen this regression in the wild with the (quite popular) ssh2 library.

Is there any chance of getting this reverted?

@wa-Nadoo
Copy link

wa-Nadoo commented Nov 22, 2020

@schmod, the changes reverted in #35941. @ronag, maybe this PR should be reopened?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants