-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-child-process-exec-timeout on Linux #20414
Comments
Again today: https://ci.nodejs.org/job/node-test-commit-linuxone/1000/nodes=rhel72-s390x/testReport/ assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec.common.mustCall (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/parallel/test-child-process-exec-timeout.js:34:12)
at /data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/common/index.js:474:15
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5) @nodejs/testing @nodejs/build @nodejs/child_process |
reviewing the type of failures and the test case, my opinion is that the test case logic is vulnerable for race conditions and arbitrary behaviors that are platform / load dependant. cp.exec(cmd, {timeout: 1}...
assert.strictEqual(err.signal, ... The
Which of this is going to happen will depend on:
cp.exec(cmd, {timeout: 2 ** 30} ...
assert.strictEqual(stdout.trim(), 'child stdout'); As the child process exits immediately after writing to the console, and as the console writing is non-blocking, there is no guarentee that the data will be flushed to the stream, whatever tiny the data element is, even if the parent waits for a long time. ref: #19218 In summary, I believe there is nothing LinuxONE specific, and the reason rooted in the test case itself. for 1 I propose the child to have a timer that is at least more than 1000 and the parent to maintain a timeout that is between 1000 and the child's timeout. for 2 I propose to end the standard streams in the child before exiting in expectation of draining the buffered data. |
Another one today: https://ci.nodejs.org/job/node-test-commit-linux/18540/nodes=fedora-last-latest-x64/consoleText ot ok 193 parallel/test-child-process-exec-timeout
---
duration_ms: 0.336
severity: fail
exitcode: 1
stack: |-
assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec.common.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/test/parallel/test-child-process-exec-timeout.js:34:12)
at /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/test/common/index.js:474:15
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5)
... Will see if I can implement fixes for the issues identified by @gireeshpunathil unless he or someone else already has something in the works or even submitted. |
Is the fact that we can get SIGSEGV instead of SIGTERM from this code a bug or a fact of life? (If a bug, is it in this code or in Node.js/libuv?) 'use strict';
if (process.argv[2] === 'child') {
setTimeout(() => {}, 1000);
return;
}
const cmd = `"${process.execPath}" "${__filename}" child`;
const assert = require('assert');
const cp = require('child_process');
cp.exec(cmd, { timeout: 1 }, (err, stdout, stderr) => {
assert.strictEqual(err.signal, 'SIGTERM');
});
That may mitigate things, but I don't think it will fix it entirely because it's still dependent on timing in two independent processes. If we're going to go that route, we should definitely move this test to |
Update: I was able to trivially cause the problem using the 15 lines of code in the previous comment on test-rackspace-fedora26-x64-1: root@test-rackspace-fedora26-x64-1 ~]# /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/node /var/tmp/test.js & /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/node /var/tmp/test.js &
[1] 5212
[2] 5213
[root@test-rackspace-fedora26-x64-1 ~]# assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec (/var/tmp/test.js:14:9)
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5) Bonus: The process hung there too. I had to end it with cntl-C. |
(The above reproduction doesn't happen every time. Like, not even close to every time. Although I certainly suspect that running 4 or 8 or 16 simultaneous instances would increase the reproducibility.) |
Update: Reproduces pretty reliably at 64 simultaneous instances. |
@nodejs/libuv |
link this with #11052 as that has new findings. |
As #11052 has revealed an unsafe singal stack issue, it explains one type of failure in this test: expected |
https://ci.nodejs.org/job/node-test-commit-linuxone/857/nodes=rhel72-s390x/consoleFull
The text was updated successfully, but these errors were encountered: