-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate test-child-process-fork-closed-channel-segfault #20836
Comments
For the first one, |
@gireeshpunathil The test checks that @nodejs/testing @nodejs/platform-windows @nodejs/child_process |
(Interestingly, the string was |
And another one (again both tests failed - I wonder if the first test has influence on the second one). |
One issue with |
Better output for test-child-process-fork-net (stdout and stderr properly interleaved): not ok 52 parallel/test-child-process-fork-net
---
duration_ms: 0.324
severity: fail
exitcode: 1
stack: |-
PARENT: server listening
CHILD: server listening
CLIENT: connected
PARENT: got connection
CLIENT: connected
CHILD: got connection
CLIENT: closed
CHILD: got connection
CHILD: got connection
CLIENT: closed
CLIENT: connected
CLIENT: connected
CLIENT: closed
CLIENT: closed
PARENT: server closed
testSocket, listening
CHILD: got socket
CLIENT: got data
CLIENT: closed
events.js:167
throw er; // Unhandled 'error' event
^
Error: write EPIPE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at SocketListSend._request (internal/socket_list.js:20:16)
at SocketListSend.close (internal/socket_list.js:40:10)
at Server.close (net.js:1624:24)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-net.js:178:16)
at Socket.emit (events.js:182:13)
at TCP._handle.close [as _onclose] (net.js:596:12)
Emitted 'error' event at:
at process.nextTick (internal/child_process.js:745:39)
at process._tickCallback (internal/process/next_tick.js:61:11)
... |
I tried to run Windows stress tests for this to see how often it fails but after compiling for a long time, the builds hit a three-minute timeout and don't run the tests. See https://ci.nodejs.org/job/node-stress-single-test/1858/nodes=win2016-1p-vs2017/console and https://ci.nodejs.org/job/node-stress-single-test/1857/nodes=win2016-1p-vs2017/console. 19:31:05 Creating library c:\workspace\node-stress-single-test\nodes\win2016-1p-vs2017\Release\mksnapshot.lib and object c:\workspace\node-stress-single-test\nodes\win2016-1p-vs2017\Release\mksnapshot.exp
19:31:05 Generating code
19:34:05 Build timed out (after 3 minutes). Marking the build as aborted.
19:34:05 Build was aborted
19:34:05 Notifying upstream projects of job completion
19:34:05 Finished: ABORTED @nodejs/build |
On my Win 10 box |
@apapirovski sure, I'll take a look |
Just my 2c but this test has been flaky since its introduction, only very infrequently replicates the condition on the originally bugged version (Node 4.0.0) and is unlikely to occur exactly like this again. As such, I think we should scrap the test and move the one important check ( If we try to fix it yet again, it'll be a 6th PR to make this test less flaky. There are currently 5 separate error conditions that we ignore in this. We might be adding a sixth. IMO it's more trouble than it's worth. |
Failures showing up on windows for 10.2.0 release |
Failure on node-daily-master task: Host: test-azure_msft-win2016-x64-5 not ok 51 parallel/test-child-process-fork-closed-channel-segfault
---
duration_ms: 0.299
severity: fail
exitcode: 1
stack: |-
c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:70
throw err;
^
Error: write EMFILE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at Worker.send (internal/cluster/worker.js:40:28)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:36:16)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1147:10)
... |
ok - I guess know the reason for failure (though cannot test now, about to leave for the day.) #3635 (comment) (refefenced in the test case) was applicable to this line - connection establishment to a server whose state is arbitrary, is prone to 3 types of errors, and hence we checked and skipped them in the client socket's error handler. This was almost 3 years back! Now it turns out to be that this line too is prone to the same problem, under similar circumstances - and that makes sense, as the server can close any time between client connecting and worker sending messages. In summary, we need an error callback for worker object too, that has the same code as in the client socket callback. |
Is |
@mmarchini - looking the failing stack, I believe it is the same issue. So you can apply the same patch as in #20973 over there too, if you are going for a PR. However, I have found issues with it, and please wait while we fix this. |
@gireeshpunathil do you mind if I open a PR to fix |
In test-child-process-fork-closed-channel-segfault.js, race condition is observed between the server getting closed and the worker sending a message. Accommodate the potential errors. Earlier, the same race was observed between the client and server and was addressed through ignoring the relevant errors through error handler. The same mechanism is re-used for worker too. The only difference is that the filter is applied at the callback instead of at the worker's error listener. Refs: #3635 (comment) Fixes: #20836 PR-URL: #20973 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Jon Moss <me@jonathanmoss.me> Reviewed-By: Matheus Marchini <matheus@sthima.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com>
https://ci.nodejs.org/job/node-test-binary-windows/17414/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/console
Likely related:
The text was updated successfully, but these errors were encountered: