Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

New failing test: 'finish' firing before _flush is done in async case. #7612

Closed

Conversation

markstos
Copy link

The 'finish' event is documented to fire with flushing is done: When "all
data has been flushed to the underlying system".

However, it appears be fired too early in the case of Transform streams
which make an async call. process.nextTick is used here but I ran into this
in a real-world case of making a call to a Mongo database.

This new failing test illustrates the apparent bug. When the code is
_flush is synchronous, there's no problem.

It fails with both 0.10.x and the latest from the 0.11 branch as well.

The 'finish' event is documented to fire with flushing is done:
When "all data has been flushed to the underlying system".

However, it appears be fired too early in the case of Transform streams
which make an async call. process.nextTick is used here but I ran into
this in a real-world case of making a call to a Mongo database.

This new failing test illustrates the apparent bug. When the code is
_flush is synchronous, there's no problem.

It fails with both 0.10.x and the latest from the 0.11 branch as well.
@vkurchatkin
Copy link

@markstos I don't think it's a bug. Why would you even listen to finish on Transform. Listen to end

@indutny
Copy link
Member

indutny commented May 13, 2014

finish is the event of the writable side of your Transform stream, emitting it means that all data that you passed to Transform stream was consumed, it has nothing to do with the readable side of the stream.

@indutny
Copy link
Member

indutny commented May 13, 2014

Sorry, not a bug.

@markstos
Copy link
Author

In the case I found, the documentation disagrees with the code. One of them is wrong and should be updated.

I provided test coverage for a case that had no coverage-- a async action in Transform._flush. Why reject test coverage for part of your system that is uncovered?

@vkurchatkin I'm listening on the finish event because it is documented to fire exactly when I want. The end event is documented to be when writing the data into the stream is done. The finish event is documented to be "all data has been flushed to the underlying system". That sounds exactly like it fires after flush is done, doesn't it?

If you are going to keep the current behavior, I recommend at least merging and modifying the test coverage to match it. (I've re-pushed to that branch with a commit format that's updated to follow the contributing guidelines). The docs should then also be updated to note:

  • You shouldn't do anything asynchronous in _flush, as it might not be finish before the finish even is fired to note that everything is has been flushed to the underlying system.
  • The finish docs should updated to note that the event is fired when everything has been flushed to the underlying system, except any asynchronous actions that may have happened in a _flush method.

It looks to me more like a bug in the code than the docs, but if you'd if you'd prefer that I create a doc patch instead to clarify the current behavior, I could do that instead.

@markstos
Copy link
Author

For other people who run into this issue in the meantime, here's a workaround I've found:

In my _flush method, I emit the 'close' event after calling the callback, and then listen for the close event instead of the finish event.

@chadxz
Copy link

chadxz commented May 13, 2014

I had this same problem when building a transform stream that did some buffered http requests. Are we just "doing it wrong"?

I chose to use a transform stream because of the _flush method, which i felt i needed to use to do buffered writes of the data. It was the last stream in the pipeline i was building, so i wanted to listen for the "finish" event to know when the stream had completed processing all of its data. But because the http request is async, the finish event always fired prior to the last http request completing.

Any tips for how to do this the right way, if the reported behavior is not a bug?

@vkurchatkin
Copy link

@markstos

The finish event is documented to be "all data has been flushed to the underlying system". That sounds exactly like it fires after flush is done, doesn't it?

That may sound like that, but this obviously can't give any guarantees about _flush, because _flush is not a part of Writable API and finish event is. _flush operates on the readable side of the stream, it is used to flush OUT, not flush IN.

I can't see any use case for listening finish on Transform. The even don't have "underlying system" per se.

@chadxz when Transform processes all data, end event is emitted

@calvinmetcalf
Copy link

yeah the documentation might need some clarification between end and finish, I've been confused in the past myself but in a nut shell.

  • 'finish' you can't write to it any more and all incoming data has been processed, this is after the callback in _transform is called for all the data, but could be before _flush.
  • 'end' you can read from it any more and all data has been emitted, this is after _flush.

@markstos
Copy link
Author

@vkurchatkin Thanks for the follow-up.

I think the introduction to the _flush docs could use some updating to clarify that it is _flushing the readable side of the steam when used in Transform. This seems especially important as work is progressing in another issue to add _flush to Writable streams. To me, _flush sounds like "flush to disk". I wouldn't think of it as "flush incoming data being read". The current intro for flush reads:

Call this function (optionally with an error argument) when you are done flushing any remaining data.

Perhaps it would be better written as:

Call this function (optionally with an error argument) to finish reading any remaining data.

A use-case for using finish on a Transform stream is to check that all data has pushed on to the next stream. If there is really no use for it on Transform streams, I would rather see it removed than to work in unexpected way.

@calvinmetcalf Since you seem to understand the stream implementation beyond what the docs offer, could you improve the docs for finish and end along the lines you suggest?

For what's worth, using the end event my Transform stream test case didn't work either:

    var stream = new require('stream').Transform();

    // Nothing interesting here
    stream._transform = function (chunk, enc, next) {
      next();
    };

    // _flush makes an async call.
    stream._flush = function (callback) {
      console.log("Expecting 1st: Entering _flush ");

      return process.nextTick(function () {
        console.log("Expecting 2nd: Done with _flush");
        callback();
      });
    };

    stream.on('end',function () {
      console.log("Expecting 3rd: Entering finish callback-- should be flushed now.");
    });

    stream.end("chunk");

In the example above, the output from the end listener is never emitted. (In case of end, it doesn't matter if _flush is async or not).

@calvinmetcalf
Copy link

you never read the stream, add an stream.on('data',function () {}); to kick it off

@vkurchatkin
Copy link

A use-case for using finish on a Transform stream is to check that all data has pushed on to the next stream.

in this case you need to listen to finish on next stream or end on current stream

anyway, if #7613 will be merged (and I hope it will) this problem will go away.

@markstos
Copy link
Author

Thanks @calvinmetcalf, that produces the expected result.

@markstos
Copy link
Author

@vkurchatkin The problem would go away because Transform would no longer have a _flush method of it's own, but would inherit the one from the updated Writable stream? It looks like the timing of finish with that patch will be after _flush.

I went to try applying the patch to a local 0.11 tree so I could test it, but I couldn't find _stream_readable.js under my .nvm directory.

I look forward to change from #7613.

@vkurchatkin
Copy link

@markstos you need to build form source to try out the patch. Also right now it's incorrect

@yinrong
Copy link

yinrong commented Mar 17, 2016

+1 can't understand after reading the doc

@markstos
Copy link
Author

@yinrong there is ongoing related discussion on a proposed PR here: nodejs/node#2314

gibfahn pushed a commit to ibmruntimes/node that referenced this pull request Feb 22, 2017
The cherry-pick of nodejs#7612 to v4.x (4369055) added in nodejs#9298 wasn't quite
correct as it depends on a runtime function %SymbolDescriptiveString
that doesn't exist on v4.x. We can use %SymbolDescription instead.

Ref: nodejs/node#7612
Ref: nodejs/node#9298

PR-URL: nodejs/node#10732
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: Myles Borins <myles.borins@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
gibfahn pushed a commit to ibmruntimes/node that referenced this pull request Feb 22, 2017
The cherry-pick of nodejs#7612 to v4.x (4369055) added in nodejs#9298 wasn't quite
correct as it depends on a runtime function %SymbolDescriptiveString
that doesn't exist on v4.x. We can use %SymbolDescription instead.

Ref: nodejs/node#7612
Ref: nodejs/node#9298

PR-URL: nodejs/node#10732
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: Myles Borins <myles.borins@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants