fs: synchronize close with other I/O for streams #30837

addaleax · 2019-12-07T13:27:39Z

Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting _read() and _destroy() without waiting
for the other to finish, which can lead to the fs.read() call
resulting in EBADF if timing is bad.

Fix this by synchronizing write and read operations with close().

Refs: #30660

/cc @ronag

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

lib/internal/fs/streams.js

ronag

~~LGTM makes sense as long as it's considered semver-minor.~~

~~This would become unnecessary and reverted through #29656.~~

lib/internal/fs/streams.js

ronag · 2019-12-07T13:49:14Z

lib/internal/fs/streams.js

  fs.write(this.fd, data, 0, data.length, this.pos, (er, bytes) => {
+    // Return early if this stream has been destroyed. The close() call inside
+    // _destroy() may cause errors when writing and we don't want to emit those.
+    if (this.destroyed) return cb();


We had a long conversation about swallowing errors after destroy() here, #29197. The consensus was that we should not swallow until after 'close'.

addaleax · 2019-12-07T14:10:58Z

@ronag Thinking about this more, the EBADF indicates that we should probably try to avoid this race condition altogether and only call fs.close() until after a fs.read() or fs.write() operation has finished … I’ll update the PR along those lines

addaleax · 2019-12-07T14:27:42Z

@ronag PTAL

lib/internal/fs/streams.js

lpinca

SGTM with @ronag's suggestions.

nodejs-github-bot · 2019-12-07T14:52:13Z

CI: https://ci.nodejs.org/job/node-test-pull-request/27454/

lib/internal/fs/streams.js

ronag

LGTM

ronag · 2019-12-07T15:57:04Z

Should we apply the same fixes to net.Socket?

ronag · 2019-12-07T16:02:33Z

lib/internal/fs/streams.js

@@ -339,7 +356,17 @@ WriteStream.prototype._write = function(data, encoding, cb) {
    });
  }

+  if (this.destroyed) return cb();


I think this might need to be a cb(new ERR_STREAM_DESTROYED('write'))

Why? There hasn’t been any error, has there?

The write hasn't completed. If we don't send an error here the caller would think the write has completed even though it hasn't.

See, https://github.com/nodejs/node/blob/master/lib/_stream_writable.js#L821

I’m worried that introducing an error when there previously was none (or at least not usually) would be semver-major, and I’d prefer to keep this PR as close to just being a fix for the bug as possible.

It's only if called with a callback (which is very unusual) in which case it's actually a bug if it's not an error.

Though if you are worried about it I guess we can leave it as is. It's an unusual edge case after all. Could we at least have a separate semver-major PR for "correct" behaviour?

Since this is a correctness issue I don't think it needs to be semver-major?

Maybe it doesn’t need to be, but I still feel like these are two very different things…

This PR addresses a race condition that can occur randomly and surprisingly. Fixing it removes unexpected errors.

Adding an error to the callback makes behaviour more consistent, but it would add unexpected errors.

I’d rather not mix the two, and I think we’ve treated other situations where we add errors for consistency as semver-major in the past (and I don’t really see any reason not to do that here, too).

I don't think that follows, does it? Right now you'll receive an EBADF error event. While the EBADF error is, er, in error, it at least tells you that the write didn't go through.

(I suppose it could also end up writing to a different file if the fd was reopened in the mean time, which of course is - edit: a lot - worse than what this PR does.)

I guess my reasoning is mostly that this is only relevant when the stream has already been destroyed at this point, and so it should be expected that writes may not finish?

If you feel strongly, I’ll apply @ronag’s suggestion, but I’m still a bit worried about breakage.

The way I see it is that you already receive unpredictable EBADF errors now. A predictable error is better, and certainly better than silently dropping data on the floor.

Another way of looking at it: how likely is this change to break existing, functionally correct code? I expect the answer is 'close to zero' - any code that breaks was probably already broken, just not reliably so.

Does that sound reasonable?

I’ve pushed a commit with the suggestion … still feeling a bit worried about it but we’ll see if this is problematic

_write and _read can be called from 'connect' after Socket.destroy() has been called. This should be a noop. Refs: nodejs#30837

bnoordhuis · 2019-12-08T04:35:50Z

lib/internal/fs/streams.js

@@ -339,7 +356,17 @@ WriteStream.prototype._write = function(data, encoding, cb) {
    });
  }

+  if (this.destroyed) return cb();


I share @ronag's concern though: it's tantamount to silently ignoring the write request from the user.

Since this is a correctness issue I don't think it needs to be semver-major?

bnoordhuis · 2019-12-08T04:38:53Z

lib/internal/fs/streams.js

  fs.read(this.fd, pool, pool.used, toRead, this.pos, (er, bytesRead) => {
+    this[kIsPerformingIO] = false;
+    // Tell ._destroy() that it's safe to close the fd now.
+    if (this.destroyed) return this.emit(kIoDone, er);


This is observable when emit() is monkey-patched, which isn't entirely uncommon. Not a reason per se not to introduce this pattern (it's pretty elegant) but I thought I'd point it out anyway.

ronag · 2019-12-08T13:21:01Z

Just a thought. What if for whatever reason the io doesn’t complete? Do we need a timeout? Or does libuv handle that?

addaleax · 2019-12-08T13:31:46Z

@ronag In that case, this PR delays the close() call along with it. I don’t think that’s a bad thing, though.

ronag · 2019-12-09T12:52:15Z

I would like to ask for @mcollina's take on this before merging. See, #30864 (comment).

nodejs-github-bot · 2019-12-09T13:49:02Z

CI: https://ci.nodejs.org/job/node-test-pull-request/27521/

Part of the flakiness in the parallel/test-readline-async-iterators-destroy test comes from fs streams starting `_read()` and `_destroy()` without waiting for the other to finish, which can lead to the `fs.read()` call resulting in `EBADF` if timing is bad. Fix this by synchronizing write and read operations with `close()`. Refs: nodejs#30660

nodejs-github-bot · 2019-12-09T13:50:54Z

CI: https://ci.nodejs.org/job/node-test-pull-request/27522/

addaleax · 2019-12-09T13:51:08Z

@ronag I guess we can do that, but at the same time I think we should fix this bug.

ronag · 2019-12-09T16:37:13Z

@ronag I guess we can do that, but at the same time I think we should fix this bug.

Given @mcollina's answer in the linked comment I'm not sure he would agree with this PR. Another valid (?) solution is also to just add on('error') handlers in the failing test.

addaleax · 2019-12-10T01:25:07Z

@ronag Let’s wait for him to comment. In my opinion EBADF just because of using async iterators over a file stream is very clearly a bug.

nodejs-github-bot · 2019-12-10T01:35:56Z

CI: https://ci.nodejs.org/job/node-test-pull-request/27544/

ronag · 2019-12-10T07:56:38Z

I mostly agree. My personal concern is whether this fix should apply everywhere (net, http, http2, quic etc...) or whether fs is a somehow an edge case?

just because of using async iterators over a file stream is very clearly a bug.

This is a bit strange for me though. Shouldn't the iterator be released once leaving the for block? How can a released async iterator cause an exception at destroy()? @mcollina @benjamingr what is the expect semantics of async iterators here? https://github.com/nodejs/node/blob/master/test/parallel/test-readline-async-iterators-destroy.js I would expect the iterator to be released and release any error listeners.

The readable stream will of course still 'error' (which might or might not make sense) and an error listener should be registered regardless?

mcollina · 2019-12-10T09:53:47Z

I'm a bit conflicted by this change.
On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening. On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

What do you think?

mcollina

LGTM

ronag · 2019-12-10T10:32:59Z

On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening. On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

Yes, that's a bit contradictory. I guess it is "safe" to wait for I/O if we're under the assumption that it will always complete (or fail) within reasonable time, otherwise we might end up with a stuck stream without means to abort it, e.g. a socket trying to write to a server which is bugged/crashed/corrupt? It might be a case by case basis. In fs I would guess it's very unlikely it would not complete within reasonable time (if you exclude FUSE). Also depends on whether e.g. libuv or the os has some sort of timeout or error handling for this.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

I think it should be part of the official API. Not sure how that helps us here though? Also, before making it public we should probably ensure the cb is invoked asynchronously (which is not the case today).

mcollina · 2019-12-10T10:58:14Z

Yes, that's a bit contradictory. I guess it is "safe" to wait for I/O if we're under the assumption that it will always complete (or fail) within reasonable time, otherwise we might end up with a stuck stream without means to abort it, e.g. a socket trying to write to a server which is bugged/crashed/corrupt? It might be a case by case basis. In fs I would guess it's very unlikely it would not complete within reasonable time (if you exclude FUSE). Also depends on whether e.g. libuv or the os has some sort of timeout or error handling for this.

I agree.

I think it should be part of the official API. Not sure how that helps us here though? Also, before making it public we should probably ensure the cb is invoked asynchronously (which is not the case today).

I think documenting that is enough.

addaleax · 2019-12-10T13:51:40Z

I'm a bit conflicted by this change.
On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening.

I’m not sure if that’s always the case, but here it’s definitely problematic. Getting EBADF is bad enough but I think it’s even possible that data is read or written from the wrong file here if the race condition timing works out really badly.

On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

I’d be okay with that, yes 👍

Part of the flakiness in the parallel/test-readline-async-iterators-destroy test comes from fs streams starting `_read()` and `_destroy()` without waiting for the other to finish, which can lead to the `fs.read()` call resulting in `EBADF` if timing is bad. Fix this by synchronizing write and read operations with `close()`. Refs: #30660 PR-URL: #30837 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

addaleax · 2019-12-10T15:04:37Z

Landed in 8a5c7f6

Part of the flakiness in the parallel/test-readline-async-iterators-destroy test comes from fs streams starting `_read()` and `_destroy()` without waiting for the other to finish, which can lead to the `fs.read()` call resulting in `EBADF` if timing is bad. Fix this by synchronizing write and read operations with `close()`. Refs: #30660 PR-URL: #30837 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>

nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label Dec 7, 2019

addaleax commented Dec 7, 2019

View reviewed changes

lib/internal/fs/streams.js Show resolved Hide resolved

addaleax mentioned this pull request Dec 7, 2019

investigate flaky parallel/test-readline-async-iterators-destroy in CI #30660

Closed

ronag approved these changes Dec 7, 2019

View reviewed changes

ronag reviewed Dec 7, 2019

View reviewed changes

lib/internal/fs/streams.js Outdated Show resolved Hide resolved

ronag requested changes Dec 7, 2019

View reviewed changes

This comment has been minimized.

Sign in to view

addaleax changed the title ~~fs: ignore fs operations and errors for destroyed streams~~ fs: synchronize close with other I/O for streams Dec 7, 2019

addaleax force-pushed the fs-readline-test branch from 5eba809 to 2a723ac Compare December 7, 2019 14:25

ronag requested changes Dec 7, 2019

View reviewed changes

lpinca approved these changes Dec 7, 2019

View reviewed changes

ronag reviewed Dec 7, 2019

View reviewed changes

lib/internal/fs/streams.js Outdated Show resolved Hide resolved

ronag approved these changes Dec 7, 2019

View reviewed changes

ronag mentioned this pull request Dec 7, 2019

net: noop destroyed socket #30839

Closed

4 tasks

ronag reviewed Dec 7, 2019

View reviewed changes

ronag added a commit to nxtedition/node that referenced this pull request Dec 7, 2019

net: noop destroyed socket

1efb877

_write and _read can be called from 'connect' after Socket.destroy() has been called. This should be a noop. Refs: nodejs#30837

bnoordhuis reviewed Dec 8, 2019

View reviewed changes

bnoordhuis approved these changes Dec 8, 2019

View reviewed changes

addaleax force-pushed the fs-readline-test branch from 1c9c40f to b1f7bf0 Compare December 9, 2019 13:50

fixup! fs: synchronize close with other I/O for streams

a1453c5

mcollina approved these changes Dec 10, 2019

View reviewed changes

Trott approved these changes Dec 10, 2019

View reviewed changes

addaleax closed this Dec 10, 2019

addaleax deleted the fs-readline-test branch December 10, 2019 15:06

ronag mentioned this pull request Dec 11, 2019

stream: normalized destroy() #30906

Closed

MylesBorins mentioned this pull request Dec 13, 2019

v13.4.0 proposal #30937

Merged

targos mentioned this pull request Jan 15, 2020

v12.15.0 release proposal #31368

Closed

RafaelKr mentioned this pull request Jan 25, 2020

fix: close fds when returning due to different file size rook2pawn/node-filecompare#8

Open

MylesBorins mentioned this pull request Feb 8, 2020

v12.16.0 proposal #31691

Merged

fs: synchronize close with other I/O for streams #30837

fs: synchronize close with other I/O for streams #30837

Conversation

addaleax commented Dec 7, 2019 • edited Loading

Checklist

ronag left a comment • edited Loading

Choose a reason for hiding this comment

This comment was marked as outdated.

ronag Dec 7, 2019 • edited Loading

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment has been minimized.

This comment has been minimized.

addaleax commented Dec 7, 2019

addaleax commented Dec 7, 2019

lpinca left a comment

Choose a reason for hiding this comment

nodejs-github-bot commented Dec 7, 2019

ronag left a comment

Choose a reason for hiding this comment

ronag commented Dec 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronag Dec 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnoordhuis Dec 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ronag commented Dec 8, 2019

addaleax commented Dec 8, 2019

ronag commented Dec 9, 2019 • edited Loading

nodejs-github-bot commented Dec 9, 2019

nodejs-github-bot commented Dec 9, 2019

addaleax commented Dec 9, 2019

ronag commented Dec 9, 2019 • edited Loading

addaleax commented Dec 10, 2019

nodejs-github-bot commented Dec 10, 2019

ronag commented Dec 10, 2019 • edited Loading

mcollina commented Dec 10, 2019

mcollina left a comment

Choose a reason for hiding this comment

ronag commented Dec 10, 2019 • edited Loading

mcollina commented Dec 10, 2019

addaleax commented Dec 10, 2019

addaleax commented Dec 10, 2019

addaleax commented Dec 7, 2019 •

edited

Loading

ronag left a comment •

edited

Loading

ronag Dec 7, 2019 •

edited

Loading

ronag Dec 7, 2019 •

edited

Loading

bnoordhuis Dec 8, 2019 •

edited

Loading

ronag commented Dec 9, 2019 •

edited

Loading

ronag commented Dec 9, 2019 •

edited

Loading

ronag commented Dec 10, 2019 •

edited

Loading

ronag commented Dec 10, 2019 •

edited

Loading