-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second child process doesn't receive data when using the stdout of first child process as stdin of second #9413
Comments
You are handing off It might be possible to work around when you create the first child's stdout pipe manually in paused mode but that's probably difficult to do without groping around in node internals. Perhaps spawn() needs to be taught a pauseOnCreate option, similar to |
/cc @targos fyi this seems to be the right answer to nodejs/help#324, too |
The problem has to do with EDIT: Oops, didn't see the comments before this. I was just about to say what @bnoordhuis said. It works if |
Is there a workaround that doesn't involve @bnoordhuis How to manually create a pipe in non flowing mode? |
Not at the moment, no. Not without monkeying around with node's internals. |
Is there a way to use
How would you go about monkey patching node's internals? Temporarily replacing I'm using NodeJS for an export runner that launches pipelines of scripts to process lots of data and upload it at the end so I would really like to find a way of doing this without the runner having very high CPU usage. |
Thanks @addaleax, it seems to be the same issue indeed! What I don't understand here is that it happens even when the two calls to |
The file descriptor of |
I haven’t looked at it in detail, so ignore me if this doesn’t make sense, but is there any reason why the second |
Just tried both and it works with |
A colleague of mine had a great idea that fixed this:
The idea is to launch the receiving task before launching the sending task ! |
@bnoordhuis Is there a way to create a process with a predefined |
@aalexgabi - is this still an outstanding issue for you? |
@gireeshpunathil Yes. A lot of CPU time is wasted because I had to pass all data through the NodeJS process instead of letting the two sub-processes pipe it directly. Changing the algorithm so that the receiving task is launched before the receiving task as pointed in my last comment was too complicated for a workaround in the end. |
cross link #18016 - both cases are root caused by same issue. |
looks like it was suggested earlier, no evidence whether it was attempted: can you please try with:
It is reliable, and simple. |
@gireeshpunathil The problem does not arise when using |
@aalexgabi - if your main motivation is to bypass node from being a middle man between I was under the impression that you want node to gain fine grained control over both the processes, and want them not to be dependent on each other - in which case anyways the data comes through Node, and the additional burden of piping does not cause too much load. Thinking further on it (between yesterday and today) I believe the issue is neither in your code nor in Node's code. There is a unspecified design at work: when we pass If we force close original If we leave it as it is, the behavior you found will occur - data will branch out into to destinations Under default conditions, I guess the data should be made available in both - following this example:
|
@gireeshpunathil My application is a task runner which it's agnostic to what tasks it executes. I have a RabbitMQ queue called
All the commands the runner executes are just Linux programs. Most of the commands the runner executes are atomic functions (generate file, transform file, merge files, split files etc.). Only the task type assembles different tasks into one pipeline. I cannot write a script for each combination because that would defy the purpose of having a generic runner in the first place. The idea is to have a system where you can mix and match system and custom commands by writing configuration and not code, except when it's necessary. Some of these tasks process gigabytes of data and can run for up to 12 hours. Passing all the data through the Node VM serves no purpose because the runner does not touch the data but only orchestrates the execution of tasks. There is a need for granular control of tasks in case of a failure. In the event of one task failing, we need to kill all other tasks, store the exit code of each task in the database, cancel a pipeline on user request etc. Some tasks are demand lots of resources so for example we may want to only respawn one child not not the entire pipeline in case of a failure. I'm sorry but I don't think I understand the second part of your comment. You are talking about cloning When you want to execute a pipeline like My understanding may not be accurate but this is how I imagine it and how I expect NodeJS to handle it. I may not be correct but it seems to me that for some reason the stdout of the first process is either a blackhole like Note that I don't need to be able to handle any |
@aalexgabi - thanks for the detailed explanation.
Of course, this explanation does not provide any relief to your scenario, it just explains the current design. While I am not sure about your reasoning around how However, our Node program does not have that info. So every child process spawn is independant, and spawns new pipes with one end at the child and the other with the parent, i.e, node. It becomes a manual work to close out / re-pipe the open ends of the pipe according to the current design. I will look into the code / engage others to see is there any improvements possible here, without causing side effects to the otherwise stable code. |
when t0 and t1 are spawned with t0's outputstream [1, 2] is piped into t1's input, a new pipe is created which uses a copy of the t0's fd. This leaves the original copy in Node parent, unattended. Net result is that when t0 produces data, it gets bifurcated into both the copies Detect the passed handle to be of 'wrap' type and close after the native spawn invocation by which time piping would have been over. Fixes: nodejs#9413 Fixes: nodejs#18016 PR-URL: nodejs#21209 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
when t0 and t1 are spawned with t0's outputstream [1, 2] is piped into t1's input, a new pipe is created which uses a copy of the t0's fd. This leaves the original copy in Node parent, unattended. Net result is that when t0 produces data, it gets bifurcated into both the copies Detect the passed handle to be of 'wrap' type and close after the native spawn invocation by which time piping would have been over. Fixes: #9413 Fixes: #18016 PR-URL: #21209 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Also posted here: http://stackoverflow.com/questions/40306385/missing-lines-when-using-the-stdout-of-a-child-process-as-stdin-of-another
When using the stdout of one child process as stdin for another, it seems that sometimes data is not passed to the next child:
Some files are empty:
ls -lhS /tmp/body-pipeline-*
FYI:
task0.stdout.pipe(task1.stdin)
solves the issue but the script uses 50% CPU (compared to 0% when passing stdout of task0 as stdin of task1) for the equivalent ofyes | tee /tmp//tmp/body-pipeline-x
The text was updated successfully, but these errors were encountered: