-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: delay child exit in AIX for pseudo-tty tests #11715
Conversation
test/pseudo-tty/no_dropped_stdio.js
Outdated
if (common.isAix) { | ||
setTimeout(function() { | ||
process.exit(0); | ||
}, 200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it would completely break the purpose of the test… if the problem really is the child exiting too quickly, maybe just set up a setTimeout
around the whole code here for AIX.
Also, the comment sounds like this is a bug in Python on AIX, not in our tests/Node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@addalex - the background for this change ishere
There exists a bug in python in AIX, but the current issue and its mitigation is outside of that bug.
I am not sure I understand why this would break the purpose of the test - can you please explain?
The sequence of child-parent interaction is thus:
- Parent turns child's std streams into pseudo terminals, and spawns the child.
- child writes data into its streams and exits. [ data of differing length, and sequence, in different tests]
- Parent sets up a read loop and continuously reads, until it gets EIO (on child termination)
- The read data is then compared with the expected data, with the help of a companion file.
The issue is: there is a race condition between child's exit and parent's read logic - if child exits too early, the parent will end up loosing all the data - in theory this is possible in all platforms, but AIX seem to be the only victim, and hence this workaround.
It is not a pre-req that the child starts only when the parent is ready. If the child stays alive, so does its fds, and hence parent can continue to read from it.
Please let me know if I am missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why this would break the purpose of the test - can you please explain?
What happens in the test is that the Node process writes a chunk of data that is slightly too large to fit into the OS buffer for the TTY (specifically for the OS X value here); for regular streams, this would mean that the data can only partly be written synchronously, and some of the data will need to be written asynchronously.
For TTYs we avoid this by turning them into blocking streams, so all data is written synchronously, and this is what the test here is checking.
That’s why the process.exit(0)
has to happen in exactly the same tick; otherwise libuv can do asynchronous I/O before the process exits.
The issue is: there is a race condition between child's exit and parent's read logic
Yeah, it’s odd… I don’t know much about AIX but intermingling the I/O logic and the process control logic seems weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@addaleax - thanks, I understand the history of the test, and agree that the nextTick will defeat the purpose of the test. I have amended the code for no_dropped_stdio.js as per your suggestion. Looking at changes in other tests, I believe they are just fine.
cc/ @nodejs/platform-aix |
c27c6c5
to
e1daabd
Compare
|
||
// In AIX, the child exits even before the python parent | ||
// can setup the readloop. Provide a reasonable delay. | ||
if (common.isAix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these, would it be better to delay the entire test when running on AIX?
e.g. something like...
function doTest() {
/** do the test stuff here **/
}
if (common.isAix)
setTimeout(doTest, 200);
else
doTest();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell - thanks, I made some changes to this effect, essentially introducing the delay before I/O is performed, instead of after. Please note that from a process control logic perspective, the order of delay is immeterial, while from an I/O logic perspective, the child writes are kept in tact (performed in same uv tick), in order not to affect the intent of the test. I hope you are fine with this?
12238ff
to
e48dcd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As per discussion I'd prefer to avoid a timeout but given that does not seem feasible this looks ok.
CI: https://ci.nodejs.org/job/node-test-commit/8414/ @addaleax , does this LGTY? cc/ @nodejs/testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gireeshpunathil I get warnings when I try to apply this patch, could you remove the extra blank lines at the end of the files?
Applying: test: delay child exit in AIX for pseudo-tty tests
.git/rebase-apply/patch:33: new blank line at EOF.
+
.git/rebase-apply/patch:86: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
setTimeout(function() { | ||
process.emit('SIGWINCH'); | ||
}, common.isAix ? 200 : 0); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gibfahn - addressed, thanks.
test/pseudo-tty/no_dropped_stdio.js
Outdated
process.stdout.write(out); | ||
process.exit(0); | ||
}, common.isAix ? 200 : 0); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gibfahn - addressed, thanks
I guess? If working around a Python bug is what we have to do then this seems like a reasonable approach. |
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay.
e48dcd8
to
418cb2b
Compare
Will land later today if there are no objections |
Another CI for good luck: https://ci.nodejs.org/job/node-test-commit/8513/ |
@gireeshpunathil I was adding the Currently this PR fixes:
EDIT: Spoke to Gireesh, #9728 is a separate issue. |
Landed in d631af5 |
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay. PR-URL: #11715 Fixes: #7973 Fixes: #9765 Fixes: #11541 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay. PR-URL: nodejs#11715 Fixes: nodejs#7973 Fixes: nodejs#9765 Fixes: nodejs#11541 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
The tests in pseudo-tty takes the form of child node writing some data and exiting, while parent python consume them through pseudo tty implementations, and validate the result. While there is no synchronization between child and parent, this works for most platforms, except AIX, where the child exits even before the parent could setup the read loop, under race conditions Fixing the race condition is ideally done through sending ACK messages to and forth, but involves massive changes and have side effect. The workaround is to address them in AIX alone, by adding a reasonable delay. PR-URL: nodejs#11715 Fixes: nodejs#7973 Fixes: nodejs#9765 Fixes: nodejs#11541 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Should this be backported |
I think so, @gireeshpunathil can you raise a backport to |
The tests in pseudo-tty takes the form of child node writing some data
and exiting, while parent python consume them through pseudo tty
implementations, and validate the result.
While there is no synchronization between child and parent, this works
for most platforms, except AIX, where the child exits even before the
parent could setup the read loop, under race conditions
Fixing the race condition is ideally done through sending ACK messages
to and forth, but involves massive changes and have side effect. The
workaround is to address them in AIX alone, by adding a reasonable delay.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)