-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
console: use synchronous write when the process is piped #25638
Conversation
I’m worried about this too. It’s the reason why piped stdio is async in the first place. |
globalConsole[kBindStreamsLazy]({ | ||
get stdout() { return makeSync(process.stdout); }, | ||
get stderr() { return makeSync(process.stderr); } | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all console
instances behave the same? I would move this into the kBindStreamsEager
and kBindStreamsLazy
functions in the console constructor file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely. We might also move this whole sync logic into the console, and adding capability there to manage the fd directly.
I’ll wait and see if we can settle on this approach before making any changes.
try { | ||
const n = writeSync(this.fd, chunk, 0, chunk.length); | ||
if (n !== chunk.length) { | ||
chunk = chunk.slice(0, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the last bytes written first? AFAIK this should be chunk = chunk.slice(n)
? Maybe just verify that in the test as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcollina PTAL. This seems wrong to me.
Thinking about it, I think starting to have indefinitely blocking I think I have time this week to look more closely into a solution based on libuv/libuv#390, if that’s okay. |
I don't understand. The assumption of
Definitely, this has been an open issue for so long! BTW, I have this solution implemented inside pino, if you would like tinker with it on the ecosystem.
The whole goal is to slow down the producer of that data, i.e. the JS vm. That has the effect of any CPU-bound activity on the event loop. |
@mcollina Is the ultimate goal here really to slow the producer down, or to resolve the stdio issues (interleaved stdio + bad interaction with |
My goal is to solve the memory issues with the unbounded (no backpressure) producer that is console.log and all the data loss that entails with that. I do not see any other way than making console.log synchronous, and I think that’s the expectations of our users as well. Note that this PR also introduces a bunch of fixes fo SyncWriteStream that we should land anyway. Moreover this would enable us to remove the synchronous access to stdout and stderr, speeding up significantly the processing of that output in a streaminf fashion (that might something far more |
@mcollina As far as I can tell, that fix is adding a 100 % CPU usage loop that busy polls writes into a pipe, so it’s something that shouldn’t be needed for FDs referring to regular files (which is the only usage for |
I think EAGAIN could potentially happen also on FDs that refers to regular files (but it's extremely less likely). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This backtracks on a lot of what we'd settled on. Guess I need to dig up stuff...
FWIW: I already tried to do this. (some years ago)
There's some context in the later comments within the PR: #1771
Notably, this but from @bnoordhuis:
A bit of background: some years ago, I think it was in v0.7, it was decided to make stdout and stderr blocking. Turns out it doesn't work so well for pipes; ttys and files are usually very fast (local ones anyway) but pipes tend to fill up rapidly.
A number of people complained about it so we made stdio-to-pipe non-blocking again (except on Windows, where it's not supported.) I forgot the exact bug reports but the theme was that stdio was too slow; on OS X, the kernel pipe buffer is only about 4 kB, so it's easy to max out.
As already mentioned here but I'm not sure if considered deeply enough as this isn't currently tagged as semver-major is that doing this can cause a blocking dependency upstream of a child.
Related, I've been planning to make a PR removing the "blocking to pipes" thing on Windows. It's unnecessary (literally only done to be the same as unix years ago, although since unix changed) and now inconsistent.
(Example patch of removing said behavior.)
diff --git a/lib/net.js b/lib/net.js
index 9eb7448c59..6c70cae264 100644
--- a/lib/net.js
+++ b/lib/net.js
@@ -283,23 +283,6 @@ function Socket(options) {
throw errnoException(err, 'open');
this[async_id_symbol] = this._handle.getAsyncId();
-
- if ((fd === 1 || fd === 2) &&
- (this._handle instanceof Pipe) &&
- process.platform === 'win32') {
- // Make stdout and stderr blocking on Windows
- err = this._handle.setBlocking(true);
- if (err)
- throw errnoException(err, 'setBlocking');
-
- this._writev = null;
- this._write = makeSyncWrite(fd);
- // makeSyncWrite adjusts this value like the original handle would, so
- // we need to let it do that by turning it into a writable, own property.
- Object.defineProperty(this._handle, 'bytesWritten', {
- value: 0, writable: true
- });
- }
}
// shut down the socket when we're finished with it.
Also 🙈, was a bit clueless back in #1771 but the context and points still stand.
Ah, also managed to (again) dig up the original issue about this class of issues: nodejs/node-v0.x-archive#3584 |
There are a significant number of issues that points out that this is worth discussing again. The behavior of process.stdout in different scenarios is so different that is very hard to rely on. Note that I’m referring to console and stdout as two different things, while a lot of those original discussions talked about stdout and stderr. IMHO we have different requirements for the two, and they should be treated differently. |
Given that this change affects console.log on a child process alone, and the fact that the benefit of making this change outweighs any drawback that is identified, I am in favor of this change; with a semver-major label and a doc update to the API to that effect. |
So, to be clear, post this PR the state of things would look as follows:
This would make one case vary depending on the implicit details of an in-application api... idk how I feel about that but I guess I'll think about it. |
@Fishrock123 that is correct. |
@Fishrock123 great overview. To be clear, the difference from before this PR is: Shouldn't it also be sync, to ttys, on Windows? That looks like the last async place for console.log. |
As far as I recall, this is literally not possible. Windows TTYs don't have any way of telling you if a message was actually "received" or "processed". You send it something and hope for the best. |
I'm conflicted about this. It fixes the above-mentioned issues, but I suppose this could introduce unwanted sync delays for servers that do not do much logging in some cases. Back in #6379, I was thinking of using a buffer instead, i.e. make console calls sync, but introduce a limited-size buffer, so that:
@Fishrock123 thoughs? |
What should we do here? @mcollina @Fishrock123 |
@ChALkeR - correct me if I am wrong; but the problematic scenario (large data comes for writing blocks the app) can still happen, in your sequence no. 3? In other words, if the data is sufficiently large (larger than the proposed internal buffer can hold) we are probably looking at the same thing? the main difference seems to be that with a buffer in between, we have a trade-off to make: the larger the buffer the better we manage the writing, but at the expense of added memory overhead? |
@ChALkeR this PR would also solve the problem of lost output in case of a crash. The buffer approach you are describing is likely to lose output in case of crashes. If we add this, we should give an API to create a stdout/stderr that are async in nature (or maybe make stdout/stderr async by default). |
@Fishrock123 @nodejs/tsc wdyt? Should we do this for v12, v13? Or should I just close this? |
ping @Fishrock123 and @nodejs/tsc again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but yes, I think this should be closed.
We are not in agreement. How do you plan to fix #24992? Should we leave that as a wont-fix? |
@mcollina I think you are not in favor of the solutions that I could come up with (as I read #24992 (comment)) … I’d prefer wontfix over something that will break use cases, yes, and I still think we should try to move away from sync stdio in the long term (still a ton of work). My personal preference would be setting a limit to the total length of all data passed to a single |
I’m lost on why this is breaking use cases, what are those? It seems this fixes a lot of old bugs, at least in unix systems. |
I did a testing with this PR's patch to see how much is the extent of latency for different data chunks in console.log: cat foo.js const h = require('http')
const child_process = require('child_process')
if (process.argv[2] === 'child') {
const x = 'x'.repeat(+process.argv[3])
console.log(`response: ${x}`)
process.exit(0)
} else {
const cp = child_process.spawn(process.execPath, [__filename, 'child', process.argv[2]])
let start = process.hrtime();
let count = 0
cp.stdout.on('data', (d) => {
count += d.length
})
cp.stdout.on('end', () => {
const diff = process.hrtime(start);
console.error((diff[0] * 1e9 + diff[1]) / (1000 * 10000))
process.exit(0)
})
} I took MAC because it has the smallest pipes:
Observation: for reasonably large values there is no visible impact. Writing 10MB or above (where slowness starts) is probably not a meaningful scenario? So I am not seeing blocking stdout being problematic to pipes. |
@mcollina @gireeshpunathil The issue isn’t latency, it’s that this blocks the process when the output pipe is full. @Fishrock123’s request-changes comment explains the issue and provides some historical context. (We already tried this, it didn’t work out well.) |
Ok, then we should document this and mark all related issues as won’t fix. |
I read so many issues and PRs concering this issue now but I can't really get a clear picture for the issue. Often it's somewhat diffuse for me. @addaleax if I'm not mistaken your concern is that this might block the event loop for a significant time. What about allowing the write to block up to a specific time? That way we'll likely overcome most truncation while making sure that the process is not blocked too long. We could say the process may only be blocked for e.g., up to one second. |
@addaleax - I briefly looked at an |
@BridgeAR How would that be implemented without significant extra complexity? It likely also wouldn’t solve the problem that this PR is originally seeking to solve, namely writing a very large amount of data (> 1GB) to stdio. @mcollina If you are still set on not introducing an upper limit for |
I think I'm fine with an upper limit to |
Closing as there is no consensus on this change. |
Fixes: #24992
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes