Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent unexpected socket closure on Mac OS X #2610

Closed
Trott opened this issue Aug 29, 2015 · 11 comments
Closed

intermittent unexpected socket closure on Mac OS X #2610

Trott opened this issue Aug 29, 2015 · 11 comments
Labels
confirmed-bug Issues with confirmed bugs. macos Issues and PRs related to the macOS platform / OSX. net Issues and PRs related to the net subsystem.

Comments

@Trott
Copy link
Member

Trott commented Aug 29, 2015

Ref: #1100
Ref: nodejs/node-v0.x-archive#16805

The following code will throw on OS X. It might take a while but it will happen. I usually see the error before runcount hits 2000.

'use strict';
const fork = require('child_process').fork;
const net = require('net');
const count = 12;
const port = 12346;

if (process.argv[2] === 'child') {
  let sockets = [];

  process.on('message', function(m, socket) {
    function sendClosed(id) {
      process.send({ id: id, status: 'closed'});
    };

    if (m.cmd === 'new') {
      sockets.push(socket);
    }

    if (m.cmd === 'close') {
      if (sockets[m.id].destroyed) {
        throw new Error('socket already destroyed');
      }
      sockets[m.id].once('close', sendClosed.bind(null, m.id));
      sockets[m.id].destroy();
    }

    if (m.cmd === 'clear') {
      sockets = [];
    }
  });

} else {
  const child = fork(process.argv[1], ['child']);

  const server = net.createServer();
  let sockets = [];

  server.on('connection', function(socket) {
    child.send({ cmd: 'new' }, socket);
    sockets.push(socket);

    if (sockets.length === count) {
      closeSockets(0);
    }
  });

  function createSockets() {
    let j = count, client;
    while (j--) {
      client = net.connect(port, '127.0.0.1');
    }
  }

  server.on('listening', function() {
    createSockets();
  });

  let runCount = 0;
  function closeSockets(i) {
    if (i === count) {
      console.log('run count: ' + ++runCount);
      child.send({ cmd: 'clear' });
      sockets = [];
      return createSockets();
    }

    child.once('message', function(m) {
      server.getConnections(function(err, num) {
        closeSockets(i + 1);
      });
    });
    child.send({ id: i, cmd: 'close' });
  };

  server.listen(port, '127.0.0.1');
}
@Trott Trott added the net Issues and PRs related to the net subsystem. label Aug 29, 2015
@brendanashworth brendanashworth added the macos Issues and PRs related to the macOS platform / OSX. label Aug 29, 2015
Trott added a commit to Trott/io.js that referenced this issue Sep 3, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

nodejs#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: nodejs#2609
Fixes: nodejs#1100
Trott added a commit that referenced this issue Sep 3, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
Trott added a commit that referenced this issue Sep 3, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue Sep 3, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

nodejs#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: nodejs#2609
Fixes: nodejs#1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
@brendanashworth brendanashworth added the confirmed-bug Issues with confirmed bugs. label Sep 4, 2015
@brendanashworth
Copy link
Contributor

@Trott any chance we could stick a loop inside the test? That way the error is easier to reproduce?

@Trott
Copy link
Member Author

Trott commented Sep 6, 2015

@brendanashworth The code pasted above will run in an infinite loop until the bug is triggered. (Note the call to createSockets() in closeSockets().) If you want to limit the number of times it runs, put in a check for the value of runCount.

Ideally, it would be refactored in a way that would trip the bug without running over and over and over. But doing that would require knowing what is causing the bug...

@Trott
Copy link
Member Author

Trott commented Sep 6, 2015

Although running it now, it's not throwing with the throw() in the test code, it's now throwing with ETIMEDOUT so ¯_(ツ)_/¯

Trott added a commit that referenced this issue Sep 6, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
Trott added a commit that referenced this issue Sep 6, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
Trott added a commit that referenced this issue Sep 6, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
Trott added a commit that referenced this issue Sep 6, 2015
This retains the key elements of test-child-process-fork-getconnections
(forks a child process, sends a bunch of sockets, uses getConnections()
to enumerate them) but contains some code to work around an apparent
intermittent bug that occurs on OS X where a socket seems to close
itself unexpectedly.

#2610 was opened for the bug that
was causing the problem in the first place.

PR-URL: #2609
Fixes: #1100
Reviewed-By: jbergstroem - Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Brendan Ashworth <brendan.ashworth@me.com>
@tnolet
Copy link

tnolet commented Sep 10, 2015

For anyone stumped with this: Just use this little web server and run it. Then start a simple apache bench. You will see the ab either fail or be really slow. I can reproduce this after running the test 2 or 3 times. This was with version 0.12.4. Downgrading to 0.10.40 fixed it.

# server.js
var http = require('http');

const PORT = 8080;

function handleRequest(request, response) {
    response.end('reply');
}

var server = http.createServer(handleRequest);

server.listen(PORT, function() {
    console.log("Server listening on: http://localhost:%s", PORT);
})

run it:

node server.js
ab -c 10 -n 1000 127.0.0.1:8080/

@evanlucas
Copy link
Contributor

@Trott did this end up getting fixed? I know you had done a lot of work on it

@Trott
Copy link
Member Author

Trott commented Feb 2, 2016

@evanlucas I haven't looked at this in months, but I did just run master branch against the test code in the original issue report. Still fails.

$ ./node test.js 
run count: 1
run count: 2
run count: 3
run count: 4
run count: 5
run count: 6
run count: 7
run count: 8
run count: 9
run count: 10
run count: 11
run count: 12
run count: 13
run count: 14
run count: 15
run count: 16
run count: 17
run count: 18
run count: 19
run count: 20
run count: 21
run count: 22
run count: 23
run count: 24
run count: 25
run count: 26
run count: 27
run count: 28
run count: 29
run count: 30
run count: 31
run count: 32
run count: 33
run count: 34
run count: 35
run count: 36
run count: 37
run count: 38
run count: 39
run count: 40
run count: 41
run count: 42
run count: 43
run count: 44
run count: 45
run count: 46
run count: 47
run count: 48
run count: 49
run count: 50
run count: 51
run count: 52
run count: 53
run count: 54
run count: 55
run count: 56
run count: 57
run count: 58
run count: 59
run count: 60
run count: 61
run count: 62
run count: 63
run count: 64
run count: 65
run count: 66
run count: 67
run count: 68
run count: 69
run count: 70
run count: 71
run count: 72
run count: 73
run count: 74
run count: 75
run count: 76
run count: 77
run count: 78
run count: 79
run count: 80
run count: 81
run count: 82
run count: 83
run count: 84
run count: 85
run count: 86
run count: 87
run count: 88
run count: 89
run count: 90
run count: 91
run count: 92
run count: 93
run count: 94
run count: 95
run count: 96
run count: 97
run count: 98
run count: 99
run count: 100
run count: 101
run count: 102
run count: 103
run count: 104
run count: 105
run count: 106
run count: 107
run count: 108
run count: 109
run count: 110
run count: 111
run count: 112
run count: 113
run count: 114
run count: 115
run count: 116
run count: 117
run count: 118
run count: 119
run count: 120
run count: 121
run count: 122
run count: 123
run count: 124
events.js:155
      throw er; // Unhandled 'error' event
      ^

Error: connect ECONNRESET 127.0.0.1:12346
    at Object.exports._errnoException (util.js:859:11)
    at exports._exceptionWithHostPort (util.js:882:20)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1055:14)
$ 

@MylesBorins
Copy link
Contributor

@Trott this is the exact same weirdness I was experiencing with the tests in #4819
The workaround is to watch for errors on stdin of the child process and handle them to avoid the throw.

https://github.com/nodejs/node/pull/4819/files#diff-e70e0e9b596132e4aa3b4bd4ced14515R60

@Trott
Copy link
Member Author

Trott commented Feb 2, 2016

@thealphanerd That's right! I'm pretty sure that workaround is in a number of our tests.

Just to be clear so that no one sees that and closes this issue: This issue exists to document the actual bug. Users ideally shouldn't have to use a workaround that is basically "if you get an error, ignore it".

@Trott
Copy link
Member Author

Trott commented Feb 2, 2016

Oh wait, I think I misunderstood you @thealphanerd. I may need to look at
that more closely...
On Tue, Feb 2, 2016 at 12:48 PM Myles Borins notifications@github.com
wrote:

@Trott https://github.com/Trott this is the exact same weirdness I was
experiencing with the tests in #4819
#4819
The workaround is to watch for errors on stdin of the child process and
handle them to avoid the throw.

https://github.com/nodejs/node/pull/4819/files#diff-e70e0e9b596132e4aa3b4bd4ced14515R60


Reply to this email directly or view it on GitHub
#2610 (comment).

@Trott
Copy link
Member Author

Trott commented Feb 2, 2016

@thealphanerd I think these are different bugs, or at least the workaround for #4819 does not seem to work here. I can't find a way to catch an error on the child process's stdin. The error here is thrown by the net.connect() call.

Although I do wonder if this specific issue might be a "not a bug" sort of issue. I just added some code to track the number of calls to net.connect(). The script just crashed for me after 3 seconds. In that time, 2976 calls to net.connect() happened. Even on the localhost interface, I'm inclined to give things a pass if opening 1000 connections a second on a four-year-old laptop causes an ECONNRESET after a few seconds.

Thoughts?

@Trott
Copy link
Member Author

Trott commented Jul 5, 2017

This now stalls out for me consistently after 1362 runs, after which I get Error: connect ETIMEDOUT 127.0.0.1:12346. It sure looks like an OS-throttling issue or something like that rather than a Node.js issue. Closing. Feel free to comment or (if permitted by GitHub) re-open if you think it should stay open.

@Trott Trott closed this as completed Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. macos Issues and PRs related to the macOS platform / OSX. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants