From e646720b5633de37d397ddc5c660d3e1bff1f39c Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Sun, 22 Jan 2017 23:15:44 -0500 Subject: [PATCH 1/6] Add guide: "Backpressuring in Streams" --- .../docs/guides/backpressuring-in-streams.md | 264 ++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100644 locale/en/docs/guides/backpressuring-in-streams.md diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md new file mode 100644 index 0000000000000..ac719d95f4cd9 --- /dev/null +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -0,0 +1,264 @@ +--- +title: Backpressuring in Streams +layout: docs.hbs +--- + +# Backpressuring in Streams + +The purpose of this guide will describe what backpressure is and the +problems in Streams that it solves. The first part of this guide will cover +these questions. The second part of the guide will introduce suggested best +practices to ensure your application's code is safe and optimized. + +We assume a little familiarity with the general definition of +[`backpressure`][], [`Buffer`][] and [`EventEmitters`][] in Node.js, as well as +some experience with [`Streams`][]. If you haven't read through those docs, +it's not a bad idea to take a look at the API documentation first, as it will +help expand your understanding while reading this guide and for following along +with examples. + +## The Problem with Streams + +In a computer system, data is transferred from one process to another through +pipes and signals. In Node.js, we find a similar mechanism and called +[`Streams`][]. Streams are great! They do so much for Node.js and almost every +part of the internal codebase utilizes the `Streams` class. As a developer, you +are more than encouraged to use them too! + +```javascript +const readline = require('readline'); + +// process.stdin and process.stdout are both instances of Streams +const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout +}); + +rl.question('Why should you use streams? ', (answer) => { + + console.log(`Maybe it's ${answer}, maybe it's because they are awesome! :)`); + + rl.close(); +}); +``` + + +There are different functions to transfer data from one process to another. In +Node.js, there is an internal built-in function called [`.pipe()`][]. There are +[other packages][] out there you can use too! Ultimately though, at the basic +level of this process, we have two separate components: the _source_ of the +data and the _consumer_. + +In Node.js the source is a [`Readable`][] stream and the consumer is the +[`Writable`][] stream (both of these may be interchanged with a [`Duplex`][] +stream, but that is out-of-scope for this guide). + +## Too Much Data, Too Quickly + +There are instance where a [`Readable`][] stream might give data to the +[`Writable`][] much too quickly --- much more than the consumer can handle! + +When that occurs, the consumer will begin to queue all the chunks of data for +later consumption. The write queue will get longer and longer, and because of +this more data must be kept in memory until the entire process has completed. + +```javascript + var fs = require('fs'); + + var inputFile = fs.createReadStream('REALLY_BIG_FILE.x'); + var outputFile = fs.createWriteStream('REALLY_BIG_FILE_DEST.x'); + + // Secretly the stream is saying: "whoa, whoa! hang on, this is way too much!" + inputFile.pipe(outputFile); +``` + + +This is why backpressure is important. If a backpressure system was not +present, the process would use up your system's memory, effectively slowing +down other processes, and monopolizing a large part of your system until +completion. + +This results in a few things: + +* Memory exhaustion +* A very overworked garbage collector +* Slowing down all other current processes + +## Memory Exhaustion + + + +## Garbage Collection + + + +## Overall Dip in System Performace + + + +## How Does Backpressure Resolve These Issues? + +The moment that backpressure is triggered can be narrowed exactly to the return +value of a `Writable`'s [`.write()`][] function. This return value is +determined by a few conditions, of course. + +In any scenario where the data buffer has exceeded the [`highWaterMark`][] or +the write queue is currently busy, [`.write()`][] will return `false`. + +When a `false` value is returned, the backpressure system kicks in. It will +pause the incoming [`Readable`][] stream from sending any data and wait until +the consumer is ready again. + +Once the the queue is finished, backpressure will allow data to be sent again. +The space in memory that was being used will free itself up and prepare for the +next glob of data. + +This effectively allows an fixed amount of memory to be used at any given +time for a [`.pipe()`][] function. There will be no memory leakage, no +indefinite buffering, and the garbage collector will only have to deal with +one area in memory! + +So, if backpressure is so important, why have you (probably) not heard of it? +Well the answer is simple: Node.js does all of this automatically for you. + +That's so great! But also not so great when we are trying to understand how to +implement our own custom streams. + +## Lifecycle of `.pipe()` + +To achieve a better understanding of backpressure, here is a flow-chart on the +lifecycle of a [`Readable`][] stream being [piped][] into a [`Writable`][] +stream: + +```javascript ++===============+ +| Your_Data | ++=======+=======+ + | ++-------v-----------+ +-------------------+ +=================+ +| Readable Stream | | Writable Stream +---------> .write(chunk) | ++-------+-----------+ +---------^---------+ +=======+=========+ + | | | + | +======================+ | +------------------v---------+ + +-----> .pipe(destination) >---+ | Is this chunk too big? | + +==^=======^========^==+ | Is the queue busy? | + ^ ^ ^ +----------+-------------+---+ + | | | | | + | | | > if (!chunk) | | + ^ | | emit .end(); | | + ^ ^ | > else | | + | ^ | emit .write(); +---v---+ +---v---+ + | | ^----^-----------------< No | | Yes | + ^ | +-------+ +---v---+ + ^ | | + | ^ emit .pause(); +=================+ | + | ^---^---------------------+ return false; <-----+---+ + | +=================+ | + | | + ^ when queue is empty +============+ | + ^---^-----------------^---< Buffering | | + | |============| | + +> emit .drain(); | | | + +> emit .resume(); +------------+ | + | | | + +------------+ add chunk to queue | + | <--^-------------------< + +============+ +``` + + +_Note_: The `.pipe()` function is typically where backpressure is invoked. In +an isolate application, both [`Readable`][] and [`Writable`][] streams +should be present. If you are writing an application meant to accept a +[`Readable`][] stream, or pipes to a [`Writable`][] stream from another app, +you may omit this detail. + +## Example App + +Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to +overwrite the functionality of [`.read()`][] or [`.write()`][] by using the +underscore version of these respective functions ([`._read()`][] and +[`._write()`][]). + +There are guidelines documented for [implementing Readable streams][] and +[implementing Writable streams][]. We will assume you've read these over. + +This application will do something very simple: take the data source and +perform something fun to the data, and then write it to another file. + +## Rules to Abide For Writing Custom Streams + +Recall that a [`.write()`][] may return true or false dependent on some +conditions. Thus, when building our own [`Writable`][] stream, we must pay +close attention those conditions and the return value: + +* If the write queue is busy, [`.write()`][] will return false. +* If the data chunk is too large, [`.write()`][] will return false (the limit +is indicated by the variable, [`highWaterMark`][]). + +_Note_: In most machines, there is a byte size that is determines when a buffer +is full (which will vary across different machines). Node.js allows you to set +your own custom [`highWaterMark`][], but commonly, the default is the optimal +value for what system is running the application. In instances where you might +want to raise that value, go for it, but do so with caution! + +## Build a WritableStream + +Let's extend the prototypical function of [`.write()`][]: + +```javascript +const Writable = require('stream').Writable; + +class MyWritable extends Writable { + constructor(options) { + super(options); + } + + _write(chunk, encoding, callback) { + if (chunk.toString().indexOf('a') >= 0) { + callback(new Error('chunk is invalid')); + } else { + callback(); + } + } +} +``` + + +## Conclusion + +Streams are a often used module in Node.js. They are important to the internal +structure, and for develoeprs, to expand and connect across the Node.js modules +ecosystem. + +Hopefully, you will now be able to troubleshoot, safely code your own +[`Writable`][] streams with backpressure in mind, and share your knowledge with +colleagues and friends. + +Be sure to read up more on [`Streams`][] for other [`EventEmitters`][] to help +improve your knowledge when building applications with Node.js. + + + +[`Streams`]: https://nodejs.org/api/stream.html +[`Buffer`]: https://nodejs.org/api/buffer.html +[`EventEmitters`]: https://nodejs.org/api/events.html +[`Writable`]: https://nodejs.org/api/stream.html#stream_writable_streams +[`Readable`]: https://nodejs.org/api/stream.html#stream_readable_streams +[`Duplex`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams +[`.drain()`]: https://nodejs.org/api/stream.html#stream_event_drain +[`.read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size +[`.write()`]: https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback +[`._read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size_1 +[`._write()`]: https://nodejs.org/docs/latest/api/stream.html#stream_writable_write_chunk_encoding_callback_1 + +[implementing Writable streams]: https://nodejs.org/docs/latest/api/stream.html#stream_implementing_a_writable_stream +[implementing Readable streams]: https://nodejs.org/docs/latest/api/stream.html#stream_implementing_a_readable_stream + +[other packages]: https://github.com/sindresorhus/awesome-nodejs#streams +[`backpressure`]: https://en.wikipedia.org/wiki/Back_pressure#Back_pressure_in_information_technology +[Node.js v0.10]: https://nodejs.org/docs/v0.10.0/ +[`highWaterMark`]: https://nodejs.org/api/stream.html#stream_buffering + +[`.pipe()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options +[piped]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options \ No newline at end of file From 26bbb66ec8252435863a9dd3bbe03a8db036bd8d Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Sun, 26 Feb 2017 17:22:36 -0500 Subject: [PATCH 2/6] review changes and update empty fields --- .../docs/guides/backpressuring-in-streams.md | 478 +++++++++++++----- 1 file changed, 347 insertions(+), 131 deletions(-) diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md index ac719d95f4cd9..2bc70a287b7db 100644 --- a/locale/en/docs/guides/backpressuring-in-streams.md +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -4,25 +4,37 @@ layout: docs.hbs --- # Backpressuring in Streams - -The purpose of this guide will describe what backpressure is and the -problems in Streams that it solves. The first part of this guide will cover -these questions. The second part of the guide will introduce suggested best -practices to ensure your application's code is safe and optimized. - -We assume a little familiarity with the general definition of -[`backpressure`][], [`Buffer`][] and [`EventEmitters`][] in Node.js, as well as -some experience with [`Streams`][]. If you haven't read through those docs, -it's not a bad idea to take a look at the API documentation first, as it will -help expand your understanding while reading this guide and for following along + +There is a general problem that arises during data handling called +[`backpressure`][]. The term is used to describe a build up of data behind a +buffer during data transfer. When the recieving end of the transfer has complex +operations, or is slower for whatever reason, there is a tendency for data from +the incoming source to begin to accumulate. + +To solve this problem, there must be a delegation system in place to ensure a +smooth flow of data from one source to another. Different communities have +resolved this issue uniquely to their programs, Unix pipes and TCP sockets are +good examples of this, and often times is referred to as _flow control_. In +Node.js, streams have been the adopted solution. + +The purpose of this guide is to further detail what backpressure is, and how +exactly streams address this in Node.js' source code. The second part of +the guide will introduce suggested best practices to ensure your application's +code is safe and optimized when implementing streams. + +We assume a little familiarity with the general definition of +[`backpressure`][], [`Buffer`][], and [`EventEmitters`][] in Node.js, as well as +some experience with [`Streams`][]. If you haven't read through those docs, +it's not a bad idea to take a look at the API documentation first, as it will +help expand your understanding while reading this guide and for following along with examples. -## The Problem with Streams +## The Problem With Data Handling -In a computer system, data is transferred from one process to another through -pipes and signals. In Node.js, we find a similar mechanism and called +In a computer system, data is transferred from one process to another through +pipes, sockets, and signals. In Node.js, we find a similar mechanism called [`Streams`][]. Streams are great! They do so much for Node.js and almost every -part of the internal codebase utilizes the `Streams` class. As a developer, you +part of the internal codebase utilizes that module. As a developer, you are more than encouraged to use them too! ```javascript @@ -35,176 +47,373 @@ const rl = readline.createInterface({ }); rl.question('Why should you use streams? ', (answer) => { - + console.log(`Maybe it's ${answer}, maybe it's because they are awesome! :)`); - + rl.close(); }); ``` +A good example of why the backpressure mechanism implemented through streams are +a great optimization can be demonstrated by comparing the internal system tools +from Node.js' [`Streams`][] implementation. -There are different functions to transfer data from one process to another. In -Node.js, there is an internal built-in function called [`.pipe()`][]. There are -[other packages][] out there you can use too! Ultimately though, at the basic -level of this process, we have two separate components: the _source_ of the -data and the _consumer_. +In one scenario, we take a large file (approximately ~9gb) and compress it up +using the familiar [`zip(1)`][] tool. -In Node.js the source is a [`Readable`][] stream and the consumer is the -[`Writable`][] stream (both of these may be interchanged with a [`Duplex`][] -stream, but that is out-of-scope for this guide). +``` +$ zip The.Matrix.1080p.mkv +``` + +While that will take a few minutes to compress, in another shell we may run +a script that takes Node.js' module [`Zlib`][], that wraps around another compression tool, [`gzip(1)`][]. + +```javascript +const gzip = require('zlib').createGzip(); +const fs = require('fs'); + +const inp = fs.createReadStream('The.Matrix.1080p.mkv'); +const out = fs.createWriteStream('The.Matrix.1080p.mkv.gz'); + +inp.pipe(gzip).pipe(out); +``` + +While the first `zip` function will ultimately fail, Node will be able to handle +large amount of data transfer. + +To test the results, try opening each compressed file. The one done by `zip` +will warn you that the file is corrupt, whereas the one done by streams will +decompress perfectly fine. ## Too Much Data, Too Quickly There are instance where a [`Readable`][] stream might give data to the -[`Writable`][] much too quickly --- much more than the consumer can handle! +[`Writable`][] much too quickly --- much more than the consumer can handle! -When that occurs, the consumer will begin to queue all the chunks of data for -later consumption. The write queue will get longer and longer, and because of +When that occurs, the consumer will begin to queue all the chunks of data for +later consumption. The write queue will get longer and longer, and because of this more data must be kept in memory until the entire process has completed. -```javascript - var fs = require('fs'); - - var inputFile = fs.createReadStream('REALLY_BIG_FILE.x'); - var outputFile = fs.createWriteStream('REALLY_BIG_FILE_DEST.x'); +Writing to a disk is a lot slower than reading from a disk, thus, when we are +trying to compress a file and write it to our hard disk, backpressure will +arise because the write disk will not be able to keep up with the speed from +the read. +```javascript // Secretly the stream is saying: "whoa, whoa! hang on, this is way too much!" - inputFile.pipe(outputFile); + // Data will begin to build up on the read-side of the data buffer as + // `write` tries to keep up with the incoming data flow. + inp.pipe(gzip).pipe(outputFile); ``` - - -This is why backpressure is important. If a backpressure system was not -present, the process would use up your system's memory, effectively slowing -down other processes, and monopolizing a large part of your system until +This is why a backpressure mechanism is important. If a backpressure system was +not present, the process would use up your system's memory, effectively slowing +down other processes, and monopolizing a large part of your system until completion. This results in a few things: -* Memory exhaustion -* A very overworked garbage collector * Slowing down all other current processes +* A very overworked garbage collector +* Memory exhaustion + +## Overall Dip in System Performance + +In the following examples we will take out the [return value][] of the +`.write()` function and change the it to `true`. In any reference to 'modified' +binary, we are talking about running the `node` binary without the `return ret;` +line, and instead with the replaced `return true;`. + +## Excess Drag on Garbage Collection + +Let's take a look at a quick benchmark. Using the same example from above, we +ran a few time trials to get a median time for both binaries. + +``` + trial (#) | `node` binary (ms) | modified `node` binary (ms) +================================================================= + 1 | 56924 | 55011 + 2 | 52686 | 55869 + 3 | 59479 | 54043 + 4 | 54473 | 55229 + 5 | 52933 | 59723 +================================================================= +average time: | 55299 | 55975 +``` + +Both take around a minute to run, so there's not much of a difference at all, +but let's take a closer look to confirm whether our suspicions are correct. We +use the linux tool [`dtrace`][] to evaluate what's happening with the V8 garbage +collector. + +The GC (garbage collector) measured time indicates the intervals of a full cycle +of a single sweep done by the garbage collector: + +``` +approx. time (ms) | GC (ms) | modified GC (ms) +================================================= + 0 | 0 | 0 + 1 | 0 | 0 + 40 | 0 | 2 + 170 | 3 | 1 + 300 | 3 | 1 + + * * * + * * * + * * * + + 39000 | 6 | 26 + 42000 | 6 | 21 + 47000 | 5 | 32 + 50000 | 8 | 28 + 54000 | 6 | 35 +``` +While the two processes start off the same and seem to work the GC at the same +rate, it becomes evident that after a few seconds with a properly working +backpressure system in place, it spreads the GC load across consistent +intervals of 4-8 milliseconds until the end of the data transfer. + +However, when a backpressure system is not in place, the V8 garbage collection +starts to drag out. The normal binary called the GC approximately `75` +times in a minute, whereas, the modified binary fires only `36` times. + +This is the slow and gradual debt accumulating from a growing memory usage. As +data gets transferred, without a backpressure system in place, more memory is +being used for each chunk transfer. + +The more memory that is being allocated, the more the GC has to take care of in +one sweep. The bigger the sweep, the more the GC needs to decide what can be +freed up and scanning for detached pointers in a larger memory space will +consume more computing power. ## Memory Exhaustion +To determine the memory consumption of each binary, we've clocked each process +with `/usr/bin/time -lp sudo ./node ./backpressure-example/zlib.js` +individually. +This is the output on the normal binary: + +``` + Respecting the return value of .write() +============================================= +real 58.88 +user 56.79 +sys 8.79 + 87810048 maximum resident set size + 0 average shared memory size + 0 average unshared data size + 0 average unshared stack size + 19427 page reclaims + 3134 page faults + 0 swaps + 5 block input operations + 194 block output operations + 0 messages sent + 0 messages received + 1 signals received + 12 voluntary context switches + 666037 involuntary context switches +``` -## Garbage Collection +The maximum byte size occupied by virtual memory turns out to be approximately +`87.81 mb`. +And now changing the [return value][] of the `.write()` function, we get: +``` + Without respecting the return value of .write(): +================================================== +real 54.48 +user 53.15 +sys 7.43 +1524965376 maximum resident set size + 0 average shared memory size + 0 average unshared data size + 0 average unshared stack size + 373617 page reclaims + 3139 page faults + 0 swaps + 18 block input operations + 199 block output operations + 0 messages sent + 0 messages received + 1 signals received + 25 voluntary context switches + 629566 involuntary context switches +``` -## Overall Dip in System Performace +The maximum byte size occupied by virtual memory turns out to be approximately +`1.52 gb`. +Without streams in place to delegate the backpressure, there is a memory space +with an entire degree of magnitude greater that is being allocated. That is a +huge margin of difference between the exact process. +This experiment shows how optimized and cost-effective Node's backpressure +mechanism is for your computing system. Now, let's do a break down on how it +works and how we can make sure we are utilizing it in our own applications when +building custom streams! ## How Does Backpressure Resolve These Issues? -The moment that backpressure is triggered can be narrowed exactly to the return -value of a `Writable`'s [`.write()`][] function. This return value is +There are different functions to transfer data from one process to another. In +Node.js, there is an internal built-in function called [`.pipe()`][]. There are +[other packages][] out there you can use too! Ultimately though, at the basic +level of this process, we have two separate components: the _source_ of the +data and the _consumer_. + +When [`.pipe()`][] is called from the source, it signals to the consumer that +there is data to be transferred. The pipe function helps to set up the +appropriate backpressure closures for the event triggers. + +In Node.js the source is a [`Readable`][] stream and the consumer is the +[`Writable`][] stream (both of these may be interchanged with a [`Duplex`][] or +a [`Transform`][] stream, but that is out-of-scope for this guide). + +The moment that backpressure is triggered can be narrowed exactly to the return +value of a `Writable`'s [`.write()`][] function. This return value is determined by a few conditions, of course. -In any scenario where the data buffer has exceeded the [`highWaterMark`][] or -the write queue is currently busy, [`.write()`][] will return `false`. +In any scenario where the data buffer has exceeded the [`highWaterMark`][] or +the write queue is currently busy, [`.write()`][] will return `false`. -When a `false` value is returned, the backpressure system kicks in. It will +When a `false` value is returned, the backpressure system kicks in. It will pause the incoming [`Readable`][] stream from sending any data and wait until -the consumer is ready again. +the consumer is ready again. Once the data buffer is emptied, a [`.drain()`][] +event will be emitted and resume the incoming data flow. + +The `Readable` stream also plays a vital role in backpressure. Because the two +processes rely on one another to communicate effectively, if the `Readable` does +not respect when the `Writable` stream asks to stop sending data, it can be +just as problematic. + +An instance of this can be unconditionally calling your `Readable` stream to +send data using the [`push method`][] or forcing data through whenever it +is available (signaled by the [`.data` event][]): + +```javascript +// This is problematic as it ignores backpressure signals +readable.on('readable', () => { + + let chunk; + while (null !== (chunk = readable.read())) { + readable.push(chunk); + } + +}); + +// Also ignores backpressure and rams data through, regardless if destination +// stream is ready or not. +readable.on('data', data => + writable.write(data); +); +``` Once the the queue is finished, backpressure will allow data to be sent again. -The space in memory that was being used will free itself up and prepare for the -next glob of data. +The space in memory that was being used will free itself up and prepare for the +next batch of data. This effectively allows an fixed amount of memory to be used at any given -time for a [`.pipe()`][] function. There will be no memory leakage, no -indefinite buffering, and the garbage collector will only have to deal with +time for a [`.pipe()`][] function. There will be no memory leakage, no +infinite buffering, and the garbage collector will only have to deal with one area in memory! -So, if backpressure is so important, why have you (probably) not heard of it? -Well the answer is simple: Node.js does all of this automatically for you. +So, if backpressure is so important, why have you (probably) not heard of it? +Well the answer is simple: Node.js does all of this automatically for you. -That's so great! But also not so great when we are trying to understand how to +That's so great! But also not so great when we are trying to understand how to implement our own custom streams. ## Lifecycle of `.pipe()` -To achieve a better understanding of backpressure, here is a flow-chart on the -lifecycle of a [`Readable`][] stream being [piped][] into a [`Writable`][] -stream: +To achieve a better understanding of backpressure, here is a flow-chart on the +lifecycle of a [`Readable`][] stream being [piped][] into a [`Writable`][] +stream: ```javascript -+===============+ -| Your_Data | -+=======+=======+ - | -+-------v-----------+ +-------------------+ +=================+ -| Readable Stream | | Writable Stream +---------> .write(chunk) | -+-------+-----------+ +---------^---------+ +=======+=========+ - | | | - | +======================+ | +------------------v---------+ - +-----> .pipe(destination) >---+ | Is this chunk too big? | - +==^=======^========^==+ | Is the queue busy? | - ^ ^ ^ +----------+-------------+---+ - | | | | | - | | | > if (!chunk) | | - ^ | | emit .end(); | | - ^ ^ | > else | | - | ^ | emit .write(); +---v---+ +---v---+ - | | ^----^-----------------< No | | Yes | - ^ | +-------+ +---v---+ - ^ | | - | ^ emit .pause(); +=================+ | - | ^---^---------------------+ return false; <-----+---+ - | +=================+ | - | | - ^ when queue is empty +============+ | - ^---^-----------------^---< Buffering | | - | |============| | - +> emit .drain(); | | | - +> emit .resume(); +------------+ | - | | | - +------------+ add chunk to queue | - | <--^-------------------< - +============+ ++===============+ +===============+ +| Your Data | | .pipe(dest) | ++===============+ |===============| + | | Setup event | + | | closures: | ++-------v-----------+ | | +-------------------+ +| Readable Stream >-----------+ * .data(cb) +---------> Writable Stream | ++-^-------^-------^-+ | * .drain(cb) | +-------------------+ + ^ ^ ^ | * .unpipe(cb) | | + | ^ | | * .error(cb) | | + | | | | * .finish(cb) | | + ^ | | | * .end(cb) | +=======v=========+ + ^ | ^ | * .close(cb) | | .write(chunk) | + | | ^ +---------------+ +=======+=========+ + | ^ | | + | ^ | +------------------v---------+ + ^ | +-> if (!chunk) | Is this chunk too big? | + ^ | | emit .end(); | Is the queue busy? | + | | +-> else +-------+----------------+---+ + | ^ | emit .write(); | | + | ^ ^ +--v---+ +---v---+ + | | ^-----------------------------------< No | | Yes | + ^ | +------+ +---v---+ + ^ | | + | ^ emit .pause(); +=================+ | + | ^---------------^-----------------------+ return false; <-----+---+ + | +=================+ | + | | + ^ when queue is empty +============+ | + ^------------^-----------------------< Buffering | | + | |============| | + +> emit .drain(); | ^Buffer^ | | + +> emit .resume(); +------------+ | + | ^Buffer^ | | + +------------+ add chunk to queue | + | <---^---------------------< + +============+ ``` - -_Note_: The `.pipe()` function is typically where backpressure is invoked. In -an isolate application, both [`Readable`][] and [`Writable`][] streams -should be present. If you are writing an application meant to accept a -[`Readable`][] stream, or pipes to a [`Writable`][] stream from another app, -you may omit this detail. +_Note:_ The `.pipe()` function is typically where backpressure is invoked. In +an isolate application, both [`Readable`][] and [`Writable`][] streams +should be present. If you are writing an application meant to accept a +[`Readable`][] stream, or pipes to a [`Writable`][] stream from another +component, you may omit this detail. ## Example App -Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to -overwrite the functionality of [`.read()`][] or [`.write()`][] by using the -underscore version of these respective functions ([`._read()`][] and +Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to +modify the behaviour of the [`.read()`][] or [`.write()`][] by using the +underscore version of these respective functions ([`._read()`][] and [`._write()`][]). -There are guidelines documented for [implementing Readable streams][] and -[implementing Writable streams][]. We will assume you've read these over. +There are guidelines documented for [implementing Readable streams][] and +[implementing Writable streams][]. We will assume you've read these over. -This application will do something very simple: take the data source and +This application will do something very simple: take the data source and perform something fun to the data, and then write it to another file. -## Rules to Abide For Writing Custom Streams +## Rules to Abide For Writing Custom Streams + +Recall that a [`.write()`][] may return true or false dependent on some +conditions. Luckily for us, when building our own [`Writable`][] stream, +the [`stream state machine`][] will handle our callbacks and determine when to +handle backpressure and optimize the flow of data for us. -Recall that a [`.write()`][] may return true or false dependent on some -conditions. Thus, when building our own [`Writable`][] stream, we must pay -close attention those conditions and the return value: +However, when we want to use a [`Writable`][] directly, we must respect the +`.write()` return value and pay close attention these conditions: * If the write queue is busy, [`.write()`][] will return false. -* If the data chunk is too large, [`.write()`][] will return false (the limit +* If the data chunk is too large, [`.write()`][] will return false (the limit is indicated by the variable, [`highWaterMark`][]). -_Note_: In most machines, there is a byte size that is determines when a buffer -is full (which will vary across different machines). Node.js allows you to set -your own custom [`highWaterMark`][], but commonly, the default is the optimal -value for what system is running the application. In instances where you might -want to raise that value, go for it, but do so with caution! +_Note_: In most machines, there is a byte size that is determines when a buffer +is full (which will vary across different machines). Node.js allows you to set +your own custom [`highWaterMark`][], but commonly, the default is the optimal +value for what system is running the application. In instances where you might +want to raise that value, go for it, but do so with caution! -## Build a WritableStream +## Build a Writable Stream -Let's extend the prototypical function of [`.write()`][]: +When we code a custom [`._write()`][], the code gets used internally by the +prototypical [`.write()`][]. This does not override the backpressure mechanism, +but we must respect the return value whenever we are building custom streams. ```javascript const Writable = require('stream').Writable; @@ -215,27 +424,25 @@ class MyWritable extends Writable { } _write(chunk, encoding, callback) { - if (chunk.toString().indexOf('a') >= 0) { - callback(new Error('chunk is invalid')); - } else { - callback(); - } + + // does something? + } } -``` +``` -## Conclusion +## Conclusion -Streams are a often used module in Node.js. They are important to the internal -structure, and for develoeprs, to expand and connect across the Node.js modules -ecosystem. +Streams are a often used module in Node.js. They are important to the internal +structure, and for develoeprs, to expand and connect across the Node.js modules +ecosystem. -Hopefully, you will now be able to troubleshoot, safely code your own -[`Writable`][] streams with backpressure in mind, and share your knowledge with -colleagues and friends. +Hopefully, you will now be able to troubleshoot, safely code your own +[`Writable`][] streams with backpressure in mind, and share your knowledge with +colleagues and friends. -Be sure to read up more on [`Streams`][] for other [`EventEmitters`][] to help +Be sure to read up more on [`Streams`][] for other [`EventEmitters`][] to help improve your knowledge when building applications with Node.js. @@ -246,11 +453,14 @@ improve your knowledge when building applications with Node.js. [`Writable`]: https://nodejs.org/api/stream.html#stream_writable_streams [`Readable`]: https://nodejs.org/api/stream.html#stream_readable_streams [`Duplex`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams +[`Transform`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams +[`Zlib`]: https://nodejs.org/api/zlib.html [`.drain()`]: https://nodejs.org/api/stream.html#stream_event_drain [`.read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size [`.write()`]: https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback [`._read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size_1 [`._write()`]: https://nodejs.org/docs/latest/api/stream.html#stream_writable_write_chunk_encoding_callback_1 +[push method]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_push_chunk_encoding [implementing Writable streams]: https://nodejs.org/docs/latest/api/stream.html#stream_implementing_a_writable_stream [implementing Readable streams]: https://nodejs.org/docs/latest/api/stream.html#stream_implementing_a_readable_stream @@ -259,6 +469,12 @@ improve your knowledge when building applications with Node.js. [`backpressure`]: https://en.wikipedia.org/wiki/Back_pressure#Back_pressure_in_information_technology [Node.js v0.10]: https://nodejs.org/docs/v0.10.0/ [`highWaterMark`]: https://nodejs.org/api/stream.html#stream_buffering +[return value]: https://github.com/nodejs/node/blob/55c42bc6e5602e5a47fb774009cfe9289cb88e71/lib/_stream_writable.js#L239 + +[dtrace]: http://dtrace.org/blogs/about/ +[`zip(1)`]: https://linux.die.net/man/1/zip +[`gzip(1)`]: https://linux.die.net/man/1/gzip +[`stream state machine`]: https://en.wikipedia.org/wiki/Finite-state_machine [`.pipe()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options -[piped]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options \ No newline at end of file +[piped]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options From 3119d132367baa7641b88e8e648eb453a9de50a2 Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Sat, 4 Mar 2017 17:23:34 -0500 Subject: [PATCH 3/6] additional changes from review --- .../docs/guides/backpressuring-in-streams.md | 281 ++++++++++++------ 1 file changed, 187 insertions(+), 94 deletions(-) diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md index 2bc70a287b7db..5cd684ced75f2 100644 --- a/locale/en/docs/guides/backpressuring-in-streams.md +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -14,7 +14,7 @@ the incoming source to begin to accumulate. To solve this problem, there must be a delegation system in place to ensure a smooth flow of data from one source to another. Different communities have resolved this issue uniquely to their programs, Unix pipes and TCP sockets are -good examples of this, and often times is referred to as _flow control_. In +good examples of this, and is often times referred to as _flow control_. In Node.js, streams have been the adopted solution. The purpose of this guide is to further detail what backpressure is, and how @@ -24,7 +24,7 @@ code is safe and optimized when implementing streams. We assume a little familiarity with the general definition of [`backpressure`][], [`Buffer`][], and [`EventEmitters`][] in Node.js, as well as -some experience with [`Streams`][]. If you haven't read through those docs, +some experience with [`Stream`][]. If you haven't read through those docs, it's not a bad idea to take a look at the API documentation first, as it will help expand your understanding while reading this guide and for following along with examples. @@ -33,7 +33,7 @@ with examples. In a computer system, data is transferred from one process to another through pipes, sockets, and signals. In Node.js, we find a similar mechanism called -[`Streams`][]. Streams are great! They do so much for Node.js and almost every +[`Stream`][]. Streams are great! They do so much for Node.js and almost every part of the internal codebase utilizes that module. As a developer, you are more than encouraged to use them too! @@ -56,17 +56,18 @@ rl.question('Why should you use streams? ', (answer) => { A good example of why the backpressure mechanism implemented through streams are a great optimization can be demonstrated by comparing the internal system tools -from Node.js' [`Streams`][] implementation. +from Node.js' [`Stream`][] implementation. -In one scenario, we take a large file (approximately ~9gb) and compress it up +In one scenario, we will take a large file (approximately ~9gb) and compress it using the familiar [`zip(1)`][] tool. ``` $ zip The.Matrix.1080p.mkv ``` -While that will take a few minutes to compress, in another shell we may run -a script that takes Node.js' module [`Zlib`][], that wraps around another compression tool, [`gzip(1)`][]. +While that will take a few minutes to complete, in another shell we may run +a script that takes Node.js' module [`zlib`][], that wraps around another +compression tool, [`gzip(1)`][]. ```javascript const gzip = require('zlib').createGzip(); @@ -78,12 +79,16 @@ const out = fs.createWriteStream('The.Matrix.1080p.mkv.gz'); inp.pipe(gzip).pipe(out); ``` -While the first `zip` function will ultimately fail, Node will be able to handle -large amount of data transfer. +To test the results, try opening each compressed file. The file compressed by +the [`zip(1)`][] tool will notify you the file is corrupt, whereas the +compression finished by [`Stream`][] will decompress without error. -To test the results, try opening each compressed file. The one done by `zip` -will warn you that the file is corrupt, whereas the one done by streams will -decompress perfectly fine. +_Note:_ In this example, we use `.pipe()` to get the data source from one end +to the other. However, notice there is no proper error handlers attached. If +a chunk of data were to fail be properly recieved, the `Readable` source or +`gzip` stream will not be destroyed. [`pump`][] is a utility tool that would +properly destroy all the streams in a pipeline if one of them fails or closes, +and is a must have in this case! ## Too Much Data, Too Quickly @@ -116,26 +121,25 @@ This results in a few things: * A very overworked garbage collector * Memory exhaustion -## Overall Dip in System Performance - In the following examples we will take out the [return value][] of the -`.write()` function and change the it to `true`. In any reference to 'modified' -binary, we are talking about running the `node` binary without the `return ret;` -line, and instead with the replaced `return true;`. +`.write()` function and change it to `true`, which effectively disables +backpressure support in Node.js core. In any reference to 'modified' binary, +we are talking about running the `node` binary without the `return ret;` line, +and instead with the replaced `return true;`. ## Excess Drag on Garbage Collection Let's take a look at a quick benchmark. Using the same example from above, we ran a few time trials to get a median time for both binaries. -``` +```javascript trial (#) | `node` binary (ms) | modified `node` binary (ms) ================================================================= - 1 | 56924 | 55011 - 2 | 52686 | 55869 - 3 | 59479 | 54043 - 4 | 54473 | 55229 - 5 | 52933 | 59723 + 1 | 56924 | 55011 + 2 | 52686 | 55869 + 3 | 59479 | 54043 + 4 | 54473 | 55229 + 5 | 52933 | 59723 ================================================================= average time: | 55299 | 55975 ``` @@ -148,7 +152,7 @@ collector. The GC (garbage collector) measured time indicates the intervals of a full cycle of a single sweep done by the garbage collector: -``` +```javascript approx. time (ms) | GC (ms) | modified GC (ms) ================================================= 0 | 0 | 0 @@ -173,16 +177,16 @@ backpressure system in place, it spreads the GC load across consistent intervals of 4-8 milliseconds until the end of the data transfer. However, when a backpressure system is not in place, the V8 garbage collection -starts to drag out. The normal binary called the GC approximately `75` -times in a minute, whereas, the modified binary fires only `36` times. +starts to drag out. The normal binary called the GC approximately __75__ +times in a minute, whereas, the modified binary fires only __36__ times. -This is the slow and gradual debt accumulating from a growing memory usage. As +This is the slow and gradual debt accumulating from growing memory usage. As data gets transferred, without a backpressure system in place, more memory is being used for each chunk transfer. The more memory that is being allocated, the more the GC has to take care of in one sweep. The bigger the sweep, the more the GC needs to decide what can be -freed up and scanning for detached pointers in a larger memory space will +freed up, and scanning for detached pointers in a larger memory space will consume more computing power. ## Memory Exhaustion @@ -193,8 +197,8 @@ individually. This is the output on the normal binary: -``` - Respecting the return value of .write() +```javascript +Respecting the return value of .write() ============================================= real 58.88 user 56.79 @@ -216,16 +220,15 @@ sys 8.79 ``` The maximum byte size occupied by virtual memory turns out to be approximately -`87.81 mb`. +87.81 mb. And now changing the [return value][] of the `.write()` function, we get: -``` - Without respecting the return value of .write(): +```javascript +Without respecting the return value of .write(): ================================================== real 54.48 -user 53.15 -sys 7.43 +user 53.15sys 7.43 1524965376 maximum resident set size 0 average shared memory size 0 average unshared data size @@ -243,16 +246,15 @@ sys 7.43 ``` The maximum byte size occupied by virtual memory turns out to be approximately -`1.52 gb`. +1.52 gb. -Without streams in place to delegate the backpressure, there is a memory space -with an entire degree of magnitude greater that is being allocated. That is a -huge margin of difference between the exact process. +Without streams in place to delegate the backpressure, there an order of +magnitude greater of memory space being allocated --- a huge margin of +difference between the same process! This experiment shows how optimized and cost-effective Node's backpressure mechanism is for your computing system. Now, let's do a break down on how it -works and how we can make sure we are utilizing it in our own applications when -building custom streams! +works! ## How Does Backpressure Resolve These Issues? @@ -271,7 +273,7 @@ In Node.js the source is a [`Readable`][] stream and the consumer is the a [`Transform`][] stream, but that is out-of-scope for this guide). The moment that backpressure is triggered can be narrowed exactly to the return -value of a `Writable`'s [`.write()`][] function. This return value is +value of a [`Writable`][]'s [`.write()`][] function. This return value is determined by a few conditions, of course. In any scenario where the data buffer has exceeded the [`highWaterMark`][] or @@ -282,33 +284,6 @@ pause the incoming [`Readable`][] stream from sending any data and wait until the consumer is ready again. Once the data buffer is emptied, a [`.drain()`][] event will be emitted and resume the incoming data flow. -The `Readable` stream also plays a vital role in backpressure. Because the two -processes rely on one another to communicate effectively, if the `Readable` does -not respect when the `Writable` stream asks to stop sending data, it can be -just as problematic. - -An instance of this can be unconditionally calling your `Readable` stream to -send data using the [`push method`][] or forcing data through whenever it -is available (signaled by the [`.data` event][]): - -```javascript -// This is problematic as it ignores backpressure signals -readable.on('readable', () => { - - let chunk; - while (null !== (chunk = readable.read())) { - readable.push(chunk); - } - -}); - -// Also ignores backpressure and rams data through, regardless if destination -// stream is ready or not. -readable.on('data', data => - writable.write(data); -); -``` - Once the the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data. @@ -331,21 +306,25 @@ lifecycle of a [`Readable`][] stream being [piped][] into a [`Writable`][] stream: ```javascript -+===============+ +===============+ -| Your Data | | .pipe(dest) | -+===============+ |===============| - | | Setup event | - | | closures: | -+-------v-----------+ | | +-------------------+ -| Readable Stream >-----------+ * .data(cb) +---------> Writable Stream | -+-^-------^-------^-+ | * .drain(cb) | +-------------------+ - ^ ^ ^ | * .unpipe(cb) | | - | ^ | | * .error(cb) | | - | | | | * .finish(cb) | | - ^ | | | * .end(cb) | +=======v=========+ - ^ | ^ | * .close(cb) | | .write(chunk) | - | | ^ +---------------+ +=======+=========+ - | ^ | | + +===================+ + x--> Piping functions +--> src.pipe(dest) | + x are set up during |===================| + x the .pipe method. | Event callbacks | + +===============+ x |-------------------| + | Your Data | x They exist outside | .on('close', cb) | + +=======+=======+ x the data flow, but | .on('data', cb) | + | x importantly attach | .on('drain', cb) | + | x events, and their | .on('unpipe', cb) | ++--------v----------+ x respective callbacks. | .on('error', cb) | +| Readable Stream +----+ | .on('finish', cb) | ++-^-------^-------^-+ | | .on('end', cb) | + ^ | ^ | +-------------------+ + | | | | + | ^ | | + ^ ^ ^ | +-------------------+ +=================+ + ^ | ^ +----> Writable Stream +---------> .write(chunk) | + | | | +-------------------+ +=======+=========+ + | | | | | ^ | +------------------v---------+ ^ | +-> if (!chunk) | Is this chunk too big? | ^ | | emit .end(); | Is the queue busy? | @@ -376,9 +355,22 @@ should be present. If you are writing an application meant to accept a [`Readable`][] stream, or pipes to a [`Writable`][] stream from another component, you may omit this detail. +In the good likelihood you are setting up a pipeline to chain together a few +streams to manipulate your data, you will be implementing [`Transform`][] +stream. In this case, your output from your [`Readable`][] stream will enter +in the [`Transform`][] and pipe into the [`Writable`][]. + +``` +Readable.pipe(Transformable).pipe(Writable); +``` + +Backpressure will be automatically applied, but note the both the incoming and +outgoing `highWaterMark` of the [`Transform`][] stream may be manipulated and +will effect the backpressure system in place. + ## Example App -Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to +Since [Node.js v0.10][], the [`Stream`][] class has offered the ability to modify the behaviour of the [`.read()`][] or [`.write()`][] by using the underscore version of these respective functions ([`._read()`][] and [`._write()`][]). @@ -389,7 +381,67 @@ There are guidelines documented for [implementing Readable streams][] and This application will do something very simple: take the data source and perform something fun to the data, and then write it to another file. -## Rules to Abide For Writing Custom Streams +## Rules to Abide By When Implementing Custom Streams + +The golden rule of streams is __to always respect backpressure__. What +constitutes as best practice is non-contradictory practice. So long as you are +careful to avoid behaviours that conflict with internal backpressure support, +you can be sure you're following good practice. + +In general, + +1. Never push() if you are not asked. +2. Never call write() after it returns false but wait for 'drain' instead. +3. Streams changes between different node versions, and the library you use. +Be careful and test things. + +## Rules specific to Readable Streams + +So far, we have taken a look at how `.write()` affects backpressure and have +focused much on the `Writable` stream. Because of Node's functionality, data is +technically flowing downstream from `Readable` to `Writable`. However, as we +can observe in any transmission of data, matter, or energy, the source is just +as important as the destination and the `Readable` stream is vital to how +backpressure is handled. + +Both these processes rely on one another to communicate effectively, if +the `Readable` ignores when the `Writable` stream asks for it to stop +sending in data, it can be just as problematic to when the `write`'s return +value is incorrect. + +So, as well with respecting the `.write()` return, we must also respect the +return value of `.push()` from the used in the `._read()` method. If `.push()` +returns a `false` value, the stream will stop reading from the source, +otherwise, it will continue without pause. + +Here is an example of bad practice using `.push()`: +```javascript +// This is problematic as it completely ignores return value from push +// which may be a signal for backpressure from the destination stream! +class MyReadable extends Readable { + _read(size) { + let chunk; + while (null !== (chunk = getNextChunk())) { + this.push(chunk); + } + } +} +``` + +Additionally, from outside the custom stream, there are pratfalls for ignoring +backpressure. In this counter-example of good practice, the application's code +forces data through whenever it is available (signaled by the +[`.data` event][]): +```javascript +// This ignores the backpressure mechanisms node has set in place, +// and unconditionally pushes through data, regardless if the +// destination stream is ready for it or not. +readable.on('data', data => + writable.write(data) +) +``` + +## Rules specific to Writable Streams Recall that a [`.write()`][] may return true or false dependent on some conditions. Luckily for us, when building our own [`Writable`][] stream, @@ -403,12 +455,40 @@ However, when we want to use a [`Writable`][] directly, we must respect the * If the data chunk is too large, [`.write()`][] will return false (the limit is indicated by the variable, [`highWaterMark`][]). +// Counter-examples +```javascript +. +. +. +``` + _Note_: In most machines, there is a byte size that is determines when a buffer is full (which will vary across different machines). Node.js allows you to set your own custom [`highWaterMark`][], but commonly, the default is the optimal value for what system is running the application. In instances where you might want to raise that value, go for it, but do so with caution! +## Build a Readable Stream + +```javascript +const Readable = require('stream').Readable; +const util = require('util'); + +class MyReadable extends Readable { + constructor(options) { + super(options); + } + + _read(chunk, encoding, callback) { + + // does something? + + } +} + +util.inherits(MyReadable, Readable); +``` + ## Build a Writable Stream When we code a custom [`._write()`][], the code gets used internally by the @@ -431,6 +511,17 @@ class MyWritable extends Writable { } ``` +## Putting it all together + +```javascript +const MyRead = require('./custom-readable') +const MyWrite = require('./custom-writable') + +MyRead.pipe(MyWrite); + +// output: + +``` ## Conclusion @@ -439,23 +530,24 @@ structure, and for develoeprs, to expand and connect across the Node.js modules ecosystem. Hopefully, you will now be able to troubleshoot, safely code your own -[`Writable`][] streams with backpressure in mind, and share your knowledge with -colleagues and friends. - -Be sure to read up more on [`Streams`][] for other [`EventEmitters`][] to help -improve your knowledge when building applications with Node.js. +[`Writable`][] and [`Readable`][] streams with backpressure in mind, and share +your knowledge with colleagues and friends. +Be sure to read up more on [`Stream`][] for other API functions to help +improve unleash your streaming capabilities when building an applcation with +Node.js -[`Streams`]: https://nodejs.org/api/stream.html +[`Stream`]: https://nodejs.org/api/stream.html [`Buffer`]: https://nodejs.org/api/buffer.html [`EventEmitters`]: https://nodejs.org/api/events.html [`Writable`]: https://nodejs.org/api/stream.html#stream_writable_streams [`Readable`]: https://nodejs.org/api/stream.html#stream_readable_streams [`Duplex`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams [`Transform`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams -[`Zlib`]: https://nodejs.org/api/zlib.html +[`zlib`]: https://nodejs.org/api/zlib.html [`.drain()`]: https://nodejs.org/api/stream.html#stream_event_drain +[`'data'` event]: https://nodejs.org/api/stream.html#event-data [`.read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size [`.write()`]: https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback [`._read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size_1 @@ -471,10 +563,11 @@ improve your knowledge when building applications with Node.js. [`highWaterMark`]: https://nodejs.org/api/stream.html#stream_buffering [return value]: https://github.com/nodejs/node/blob/55c42bc6e5602e5a47fb774009cfe9289cb88e71/lib/_stream_writable.js#L239 -[dtrace]: http://dtrace.org/blogs/about/ +[`dtrace`]: http://dtrace.org/blogs/about/ [`zip(1)`]: https://linux.die.net/man/1/zip [`gzip(1)`]: https://linux.die.net/man/1/gzip [`stream state machine`]: https://en.wikipedia.org/wiki/Finite-state_machine [`.pipe()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options [piped]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_pipe_destination_options +[`pump`]: https://github.com/mafintosh/pump From c62f2529967da4c59bf2f4303aab59661748155a Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Fri, 7 Apr 2017 01:25:43 -0400 Subject: [PATCH 4/6] grammar fixes --- .../docs/guides/backpressuring-in-streams.md | 176 +++++++----------- 1 file changed, 70 insertions(+), 106 deletions(-) diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md index 5cd684ced75f2..5ed3728eea3d5 100644 --- a/locale/en/docs/guides/backpressuring-in-streams.md +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -5,11 +5,11 @@ layout: docs.hbs # Backpressuring in Streams -There is a general problem that arises during data handling called -[`backpressure`][]. The term is used to describe a build up of data behind a -buffer during data transfer. When the recieving end of the transfer has complex -operations, or is slower for whatever reason, there is a tendency for data from -the incoming source to begin to accumulate. +There is a general problem that occurs during data handling called +[`backpressure`][] and describes a buildup of data behind a buffer during data +transfer. When the recieving end of the transfer has complex operations, or is +slower for whatever reason, there is a tendency for data from the incoming +source to accumulate, like a clog. To solve this problem, there must be a delegation system in place to ensure a smooth flow of data from one source to another. Different communities have @@ -26,8 +26,7 @@ We assume a little familiarity with the general definition of [`backpressure`][], [`Buffer`][], and [`EventEmitters`][] in Node.js, as well as some experience with [`Stream`][]. If you haven't read through those docs, it's not a bad idea to take a look at the API documentation first, as it will -help expand your understanding while reading this guide and for following along -with examples. +help expand your understanding while reading this guide. ## The Problem With Data Handling @@ -83,7 +82,7 @@ To test the results, try opening each compressed file. The file compressed by the [`zip(1)`][] tool will notify you the file is corrupt, whereas the compression finished by [`Stream`][] will decompress without error. -_Note:_ In this example, we use `.pipe()` to get the data source from one end +Note: In this example, we use `.pipe()` to get the data source from one end to the other. However, notice there is no proper error handlers attached. If a chunk of data were to fail be properly recieved, the `Readable` source or `gzip` stream will not be destroyed. [`pump`][] is a utility tool that would @@ -93,7 +92,7 @@ and is a must have in this case! ## Too Much Data, Too Quickly There are instance where a [`Readable`][] stream might give data to the -[`Writable`][] much too quickly --- much more than the consumer can handle! +[`Writable`][] much too quickly — much more than the consumer can handle! When that occurs, the consumer will begin to queue all the chunks of data for later consumption. The write queue will get longer and longer, and because of @@ -101,7 +100,7 @@ this more data must be kept in memory until the entire process has completed. Writing to a disk is a lot slower than reading from a disk, thus, when we are trying to compress a file and write it to our hard disk, backpressure will -arise because the write disk will not be able to keep up with the speed from +occur because the write disk will not be able to keep up with the speed from the read. ```javascript @@ -222,7 +221,7 @@ sys 8.79 The maximum byte size occupied by virtual memory turns out to be approximately 87.81 mb. -And now changing the [return value][] of the `.write()` function, we get: +And now changing the [return value][] of the [`.write()`][] function, we get: ```javascript Without respecting the return value of .write(): @@ -248,8 +247,8 @@ user 53.15sys 7.43 The maximum byte size occupied by virtual memory turns out to be approximately 1.52 gb. -Without streams in place to delegate the backpressure, there an order of -magnitude greater of memory space being allocated --- a huge margin of +Without streams in place to delegate the backpressure, there is an order of +magnitude greater of memory space being allocated - a huge margin of difference between the same process! This experiment shows how optimized and cost-effective Node's backpressure @@ -288,7 +287,7 @@ Once the the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data. -This effectively allows an fixed amount of memory to be used at any given +This effectively allows a fixed amount of memory to be used at any given time for a [`.pipe()`][] function. There will be no memory leakage, no infinite buffering, and the garbage collector will only have to deal with one area in memory! @@ -349,26 +348,22 @@ stream: +============+ ``` -_Note:_ The `.pipe()` function is typically where backpressure is invoked. In -an isolate application, both [`Readable`][] and [`Writable`][] streams -should be present. If you are writing an application meant to accept a -[`Readable`][] stream, or pipes to a [`Writable`][] stream from another -component, you may omit this detail. +Note: If you are setting up a pipeline to chain together a few streams to +manipulate your data, you will most likely be implementing [`Transform`][] +stream. -In the good likelihood you are setting up a pipeline to chain together a few -streams to manipulate your data, you will be implementing [`Transform`][] -stream. In this case, your output from your [`Readable`][] stream will enter -in the [`Transform`][] and pipe into the [`Writable`][]. +In this case, your output from your [`Readable`][] stream will enter in the +[`Transform`][] and will pipe into the [`Writable`][]. -``` +```javascript Readable.pipe(Transformable).pipe(Writable); ``` Backpressure will be automatically applied, but note the both the incoming and outgoing `highWaterMark` of the [`Transform`][] stream may be manipulated and -will effect the backpressure system in place. +will effect the backpressure system. -## Example App +## Backpressure Guidelines Since [Node.js v0.10][], the [`Stream`][] class has offered the ability to modify the behaviour of the [`.read()`][] or [`.write()`][] by using the @@ -376,10 +371,8 @@ underscore version of these respective functions ([`._read()`][] and [`._write()`][]). There are guidelines documented for [implementing Readable streams][] and -[implementing Writable streams][]. We will assume you've read these over. - -This application will do something very simple: take the data source and -perform something fun to the data, and then write it to another file. +[implementing Writable streams][]. We will assume you've read these over, and +the next section will go a little bit more in-depth. ## Rules to Abide By When Implementing Custom Streams @@ -390,31 +383,37 @@ you can be sure you're following good practice. In general, -1. Never push() if you are not asked. -2. Never call write() after it returns false but wait for 'drain' instead. +1. Never `.push()` if you are not asked. +2. Never call `.write()` after it returns false but wait for 'drain' instead. 3. Streams changes between different node versions, and the library you use. Be careful and test things. +Note: In regards to point 3, an incredibly useful package for building +browser streams is [`readable-stream`][]. Rodd Vagg has written a +[great blog post][] describing the utility of this library. In short, it +provides a type of automated graceful degradation for [`Readable`][] streams, +and supports older versions of browsers and Node.js. + ## Rules specific to Readable Streams -So far, we have taken a look at how `.write()` affects backpressure and have -focused much on the `Writable` stream. Because of Node's functionality, data is -technically flowing downstream from `Readable` to `Writable`. However, as we -can observe in any transmission of data, matter, or energy, the source is just -as important as the destination and the `Readable` stream is vital to how -backpressure is handled. +So far, we have taken a look at how [`.write()`][] affects backpressure and have +focused much on the [`Writable`][] stream. Because of Node's functionality, +data is technically flowing downstream from [`Readable`][] to [`Writable`][]. +However, as we can observe in any transmission of data, matter, or energy, the +source is just as important as the destination and the [`Readable`][] stream +is vital to how backpressure is handled. Both these processes rely on one another to communicate effectively, if -the `Readable` ignores when the `Writable` stream asks for it to stop -sending in data, it can be just as problematic to when the `write`'s return +the [`Readable`][] ignores when the [`Writable`][] stream asks for it to stop +sending in data, it can be just as problematic to when the [`.write()`][]'s return value is incorrect. -So, as well with respecting the `.write()` return, we must also respect the -return value of `.push()` from the used in the `._read()` method. If `.push()` -returns a `false` value, the stream will stop reading from the source, -otherwise, it will continue without pause. +So, as well with respecting the [`.write()`][] return, we must also respect the +return value of [`.push()`][] used in the [`._read()`][] method. If +[`.push()`][] returns a `false` value, the stream will stop reading from the +source. Otherwise, it will continue without pause. -Here is an example of bad practice using `.push()`: +Here is an example of bad practice using [`.push()`][]: ```javascript // This is problematic as it completely ignores return value from push // which may be a signal for backpressure from the destination stream! @@ -449,79 +448,41 @@ the [`stream state machine`][] will handle our callbacks and determine when to handle backpressure and optimize the flow of data for us. However, when we want to use a [`Writable`][] directly, we must respect the -`.write()` return value and pay close attention these conditions: +[`.write()`][] return value and pay close attention these conditions: * If the write queue is busy, [`.write()`][] will return false. * If the data chunk is too large, [`.write()`][] will return false (the limit is indicated by the variable, [`highWaterMark`][]). -// Counter-examples -```javascript -. -. -. -``` - -_Note_: In most machines, there is a byte size that is determines when a buffer -is full (which will vary across different machines). Node.js allows you to set -your own custom [`highWaterMark`][], but commonly, the default is the optimal -value for what system is running the application. In instances where you might -want to raise that value, go for it, but do so with caution! - -## Build a Readable Stream - -```javascript -const Readable = require('stream').Readable; -const util = require('util'); - -class MyReadable extends Readable { - constructor(options) { - super(options); - } - - _read(chunk, encoding, callback) { - - // does something? - - } -} - -util.inherits(MyReadable, Readable); -``` - -## Build a Writable Stream - -When we code a custom [`._write()`][], the code gets used internally by the -prototypical [`.write()`][]. This does not override the backpressure mechanism, -but we must respect the return value whenever we are building custom streams. - ```javascript -const Writable = require('stream').Writable; - +// This writable is invalid because of the async nature of javascript callbacks. +// Without a return statement for each callback prior to the last, +// there is a great chance multiple callbacks will be called. class MyWritable extends Writable { - constructor(options) { - super(options); - } - _write(chunk, encoding, callback) { - - // does something? - + if (chunk.toString().indexOf('a') >= 0) + callback(); + else if (chunk.toString().indexOf('b') >= 0) + callback(); + callback(); } } -``` - -## Putting it all together -```javascript -const MyRead = require('./custom-readable') -const MyWrite = require('./custom-writable') +// The proper way to write this would be: + if (chunk.contains('a')) + return callback(); + else if (chunk.contains('b')) + return callback(); + callback(); +``` -MyRead.pipe(MyWrite); -// output: +Note: In most machines, there is a byte size that is determines when a buffer +is full (which will vary across different machines). Node.js allows you to set +your own custom [`highWaterMark`][], but commonly, the default is the optimal +value for what system is running the application. In instances where you might +want to raise that value, go for it, but do so with caution! -``` ## Conclusion @@ -535,7 +496,7 @@ your knowledge with colleagues and friends. Be sure to read up more on [`Stream`][] for other API functions to help improve unleash your streaming capabilities when building an applcation with -Node.js +Node.js. [`Stream`]: https://nodejs.org/api/stream.html @@ -547,7 +508,7 @@ Node.js [`Transform`]: https://nodejs.org/api/stream.html#stream_duplex_and_transform_streams [`zlib`]: https://nodejs.org/api/zlib.html [`.drain()`]: https://nodejs.org/api/stream.html#stream_event_drain -[`'data'` event]: https://nodejs.org/api/stream.html#event-data +[`.data` event]: https://nodejs.org/api/stream.html#event-data [`.read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size [`.write()`]: https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback [`._read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size_1 @@ -563,6 +524,9 @@ Node.js [`highWaterMark`]: https://nodejs.org/api/stream.html#stream_buffering [return value]: https://github.com/nodejs/node/blob/55c42bc6e5602e5a47fb774009cfe9289cb88e71/lib/_stream_writable.js#L239 +[`readable-stream`]: https://github.com/nodejs/readable-stream +[great blog post]:https://r.va.gg/2014/06/why-i-dont-use-nodes-core-stream-module.html + [`dtrace`]: http://dtrace.org/blogs/about/ [`zip(1)`]: https://linux.die.net/man/1/zip [`gzip(1)`]: https://linux.die.net/man/1/gzip From 56bdcdb776858c196ce94e570a2c03d1dcafc0d0 Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Fri, 7 Apr 2017 14:38:23 -0400 Subject: [PATCH 5/6] add ._writev pattern --- .../docs/guides/backpressuring-in-streams.md | 51 ++++++++++++++++--- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md index 5ed3728eea3d5..54155f7164e86 100644 --- a/locale/en/docs/guides/backpressuring-in-streams.md +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -298,6 +298,12 @@ Well the answer is simple: Node.js does all of this automatically for you. That's so great! But also not so great when we are trying to understand how to implement our own custom streams. +Note: In most machines, there is a byte size that is determines when a buffer +is full (which will vary across different machines). Node.js allows you to set +your own custom [`highWaterMark`][], but commonly, the default is the optimal +value for what system is running the application. In instances where you might +want to raise that value, go for it, but do so with caution! + ## Lifecycle of `.pipe()` To achieve a better understanding of backpressure, here is a flow-chart on the @@ -476,13 +482,42 @@ class MyWritable extends Writable { callback(); ``` +There are also some things to look out for when implementing [`._writev()`][]. +The function is coupled with [`.cork()`][], but there is a common mistake when +writing: +```javascript + // Using .uncork() twice here makes two calls on the C++ layer, rendering the + // cork/uncork technique useless. + ws.cork() + ws.write('hello ') + ws.write('world ') + ws.uncork() + + ws.cork() + ws.write('from ') + ws.write('Matteo') + ws.uncork() + + // The correct way to write this is to utilize process.nextTick(), which fires + // on the next event loop. + ws.cork() + ws.write('hello ') + ws.write('world ') + process.nextTick(doUncork, ws) + + ws.cork() + ws.write('from ') + ws.write('Matteo') + process.nextTick(doUncork, ws) + + // as a global function + function doUncork (stream) { + stream.uncork() + } +``` -Note: In most machines, there is a byte size that is determines when a buffer -is full (which will vary across different machines). Node.js allows you to set -your own custom [`highWaterMark`][], but commonly, the default is the optimal -value for what system is running the application. In instances where you might -want to raise that value, go for it, but do so with caution! - +[`.cork()`][] can be called as many times we want, we just need to be careful to +call [`.uncork()`][] the same amount of times to make it flow again. ## Conclusion @@ -513,6 +548,10 @@ Node.js. [`.write()`]: https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback [`._read()`]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_read_size_1 [`._write()`]: https://nodejs.org/docs/latest/api/stream.html#stream_writable_write_chunk_encoding_callback_1 +[`._writev()`]: https://nodejs.org/api/stream.html#stream_writable_writev_chunks_callback +[`.cork()`]: https://nodejs.org/api/stream.html#stream_writable_cork +[`.uncork()`]: https://nodejs.org/api/stream.html#stream_writable_uncork + [push method]: https://nodejs.org/docs/latest/api/stream.html#stream_readable_push_chunk_encoding [implementing Writable streams]: https://nodejs.org/docs/latest/api/stream.html#stream_implementing_a_writable_stream From d78008eb4ec6ea215984c28877eb6574d4c9c8a3 Mon Sep 17 00:00:00 2001 From: Jessica Quynh Tran Date: Fri, 14 Apr 2017 07:22:26 -0400 Subject: [PATCH 6/6] highwatermark default --- locale/en/docs/guides/backpressuring-in-streams.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/locale/en/docs/guides/backpressuring-in-streams.md b/locale/en/docs/guides/backpressuring-in-streams.md index 54155f7164e86..828a045c551a2 100644 --- a/locale/en/docs/guides/backpressuring-in-streams.md +++ b/locale/en/docs/guides/backpressuring-in-streams.md @@ -300,8 +300,8 @@ implement our own custom streams. Note: In most machines, there is a byte size that is determines when a buffer is full (which will vary across different machines). Node.js allows you to set -your own custom [`highWaterMark`][], but commonly, the default is the optimal -value for what system is running the application. In instances where you might +your own custom [`highWaterMark`][], but commonly, the default is set to 16kb +(16384, or 16 for objectMode streams). In instances where you might want to raise that value, go for it, but do so with caution! ## Lifecycle of `.pipe()`