cluster: ignore queryServer msgs on disconnection #4465

santigimeno · 2015-12-29T14:54:24Z

It avoids the creation of unnecessary handles. This issue is causing
intermitent failures in test-cluster-disconnect-race on FreeBSD
and OS X:

assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: Resource leak detected.
    at removeWorker (cluster.js:321:7)
    at ChildProcess.<anonymous> (cluster.js:356:9)
    at ChildProcess.g (events.js:264:16)
    at emitTwo (events.js:88:13)
    at ChildProcess.emit (events.js:173:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

The problem is that the worker2.disconnect is being called on the
master before the queryServer is handled, causing the worker to
be deleted, then the Server handle is created afterwards. Later on,
when removeWorker is called from the exit handler, there are no
workers left, but one handle, thus the AssertionError.

Modify test-cluster-disconnect-race to check there are no leaks on
exit.

mscdex · 2015-12-29T15:56:05Z

/cc @Trott

Trott · 2015-12-29T19:23:39Z

CI: https://ci.nodejs.org/job/node-test-commit/1569/

Trott · 2015-12-29T19:31:18Z

The altered test doesn't fail with current master. Is there an easy way to devise a test that will fail reliably without this change?

Trott · 2015-12-29T20:46:31Z

This is not necessarily a bad thing, but this change greatly increases the frequency of EPIPE flakiness in test-cluster-shared-leak.js on Windows. There's a PR in to swallow those errors anyway so maybe it's not a problem at all. But, FYI...

Stress test against master (should fail here and there): https://ci.nodejs.org/job/node-stress-single-test/257/nodes=win2012r2/console

Stress test with this change (fails a lot more): https://ci.nodejs.org/job/node-stress-single-test/260/nodes=win2012r2/console

(By the way, that PR could use a LGTM. Hint, hint!)

santigimeno · 2015-12-29T23:27:01Z

@Trott I've updated the test so it always fails in my OS X computers without this change. Thanks for the review!

santigimeno · 2015-12-30T20:10:37Z

Finally I have not modified test-cluster-disconnect-race but created a separate test based on it (but creating lots of it), that fails consistently in Debian Jessie 64 and OS X.

Trott · 2015-12-31T02:42:32Z

test/sequential/test-cluster-disconnect-leak.js

+  }));
+
+  const cpus = os.cpus().length;
+  const tries = cpus * 16;


Nit: Maybe move these declarations up above the worker1.on('message', ...) because someone reading the test will probably expect tries to be defined before it is used on line 22?

Trott · 2015-12-31T02:43:46Z

LGTM. One minor nit that you can ignore if you want.

CI: https://ci.nodejs.org/job/node-test-commit/1587/

Nice work!

Trott · 2015-12-31T03:00:08Z

The test is timing out on SmartOS (which is based on Solaris):

@jbergstroem Might this have a cause that lies elsewhere? I feel like there's been an uptick in SmartOS issues lately, but I might be imagining it...

jbergstroem · 2015-12-31T03:07:12Z

@Trott the "only" changes made recently that would affect all tests are yesterdays land of where sockets are written and today's change of where we write the temporary directory ($HOME/node-tmp).

Trott · 2015-12-31T03:27:00Z

So the test is probably legitimately timing out (probably hanging) on SmartOS and only SmartOS. That's kind of a bummer. But that's why we test...

jasnell · 2016-01-04T17:04:35Z

LGTM

Trott · 2016-01-05T04:31:42Z

This still needs a tweak so the test doesn't time out on SmartOS on CI. (Just putting this note here so no one merges this without realizing that or something.)

It avoids the creation of unnecessary handles. This issue is causing intermitent failures in `test-cluster-disconnect-race` on `FreeBSD` and `OS X`: ``` assert.js:89 throw new assert.AssertionError({ ^ AssertionError: Resource leak detected. at removeWorker (cluster.js:321:7) at ChildProcess.<anonymous> (cluster.js:356:9) at ChildProcess.g (events.js:264:16) at emitTwo (events.js:88:13) at ChildProcess.emit (events.js:173:7) at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12) ``` The problem is that the `worker2.disconnect` is being called on the master before the `queryServer` is handled, causing the worker to be deleted, then the Server handle is created afterwards. Later on, when `removeWorker` is called from the `exit` handler, there are no workers left, but one handle, thus the `AssertionError`. Add a new `test/sequential/test-cluster-disconnect-leak` based on `test-cluster-disconnect-race` that creates lots of workers and fails consistently without this patch.

santigimeno · 2016-01-07T22:43:47Z

@Trott I've updated the PR to limit the number of workers. Let's see if now it passes in all platforms. Thanks

jasnell · 2016-01-07T23:51:06Z

New CI: https://ci.nodejs.org/job/node-test-pull-request/1163/

Trott · 2016-01-08T04:49:43Z

CI is green!

jbergstroem · 2016-01-08T05:09:41Z

✅ LGTM

It avoids the creation of unnecessary handles. This issue is causing intermitent failures in `test-cluster-disconnect-race` on `FreeBSD` and `OS X`. The problem is that the `worker2.disconnect` is being called on the master before the `queryServer` is handled, causing the worker to be deleted, then the Server handle is created afterwards. Later on, when `removeWorker` is called from the `exit` handler, there are no workers left, but one handle, thus the `AssertionError`. Add a new `test/sequential/test-cluster-disconnect-leak` based on `test-cluster-disconnect-race` that creates lots of workers and fails consistently without this patch. PR-URL: nodejs#4465 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Rich Trott <rtrott@gmail.com>

Trott · 2016-01-08T05:44:45Z

Landed in f9f1dd9

It avoids the creation of unnecessary handles. This issue is causing intermitent failures in `test-cluster-disconnect-race` on `FreeBSD` and `OS X`. The problem is that the `worker2.disconnect` is being called on the master before the `queryServer` is handled, causing the worker to be deleted, then the Server handle is created afterwards. Later on, when `removeWorker` is called from the `exit` handler, there are no workers left, but one handle, thus the `AssertionError`. Add a new `test/sequential/test-cluster-disconnect-leak` based on `test-cluster-disconnect-race` that creates lots of workers and fails consistently without this patch. PR-URL: #4465 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Rich Trott <rtrott@gmail.com>

It avoids the creation of unnecessary handles. This issue is causing intermitent failures in `test-cluster-disconnect-race` on `FreeBSD` and `OS X`. The problem is that the `worker2.disconnect` is being called on the master before the `queryServer` is handled, causing the worker to be deleted, then the Server handle is created afterwards. Later on, when `removeWorker` is called from the `exit` handler, there are no workers left, but one handle, thus the `AssertionError`. Add a new `test/sequential/test-cluster-disconnect-leak` based on `test-cluster-disconnect-race` that creates lots of workers and fails consistently without this patch. PR-URL: nodejs#4465 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Rich Trott <rtrott@gmail.com>

mscdex added the cluster Issues and PRs related to the cluster subsystem. label Dec 29, 2015

santigimeno force-pushed the fix_dd branch from c39b25c to 269bf36 Compare December 29, 2015 23:25

santigimeno force-pushed the fix_dd branch from 269bf36 to af179dc Compare December 30, 2015 19:59

Trott reviewed Dec 31, 2015
View reviewed changes

Trott mentioned this pull request Jan 2, 2016

test-cluster-disconnect.js AssertionError: 1 == 2 on busy arm host #3383

Closed

santigimeno force-pushed the fix_dd branch from af179dc to 9a475bc Compare January 2, 2016 12:07

jasnell added the lts-watch-v4.x label Jan 4, 2016

santigimeno force-pushed the fix_dd branch from 9a475bc to 6b5fd99 Compare January 7, 2016 22:42

Trott closed this Jan 8, 2016

MylesBorins mentioned this pull request Jan 11, 2016

V5.4.1 propose #4626

Merged

Trott mentioned this pull request Jan 14, 2016

LTS: backport no-unused-vars to v4.x-staging #4688

Merged

MylesBorins added land-on-v4.x and removed lts-watch-v4.x labels Jan 28, 2016

MylesBorins mentioned this pull request Feb 11, 2016

V4.3.1 proposal #5200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster: ignore queryServer msgs on disconnection #4465

cluster: ignore queryServer msgs on disconnection #4465

santigimeno commented Dec 29, 2015

mscdex commented Dec 29, 2015

Trott commented Dec 29, 2015

Trott commented Dec 29, 2015

Trott commented Dec 29, 2015

santigimeno commented Dec 29, 2015

santigimeno commented Dec 30, 2015

Trott Dec 31, 2015

Trott commented Dec 31, 2015

Trott commented Dec 31, 2015

jbergstroem commented Dec 31, 2015

Trott commented Dec 31, 2015

jasnell commented Jan 4, 2016

Trott commented Jan 5, 2016

santigimeno commented Jan 7, 2016

jasnell commented Jan 7, 2016

Trott commented Jan 8, 2016

jbergstroem commented Jan 8, 2016

Trott commented Jan 8, 2016

cluster: ignore queryServer msgs on disconnection #4465

cluster: ignore queryServer msgs on disconnection #4465

Conversation

santigimeno commented Dec 29, 2015

mscdex commented Dec 29, 2015

Trott commented Dec 29, 2015

Trott commented Dec 29, 2015

Trott commented Dec 29, 2015

santigimeno commented Dec 29, 2015

santigimeno commented Dec 30, 2015

Trott Dec 31, 2015

Choose a reason for hiding this comment

Trott commented Dec 31, 2015

Trott commented Dec 31, 2015

jbergstroem commented Dec 31, 2015

Trott commented Dec 31, 2015

jasnell commented Jan 4, 2016

Trott commented Jan 5, 2016

santigimeno commented Jan 7, 2016

jasnell commented Jan 7, 2016

Trott commented Jan 8, 2016

jbergstroem commented Jan 8, 2016

Trott commented Jan 8, 2016