diff --git a/doc/api/buffer.md b/doc/api/buffer.md index dfb18eeb5615c2..5c56325e1470b1 100644 --- a/doc/api/buffer.md +++ b/doc/api/buffer.md @@ -197,6 +197,38 @@ the WHATWG specification it is possible that the server actually returned `'win-1252'`-encoded data, and using `'latin1'` encoding may incorrectly decode the characters. +### Evaluating legal code points for '`utf-8'` encoding + +Byte sequences that do not have corresponding UTF-16 encodings and non-legal +Unicode values, along with their UTF-8 counterparts must be treated as +invalid byte sequences. + +For cases regarding operations other than employing backward compatibility +for 7-bit (and [extended 8-bit]((https://en.wikipedia.org/wiki/UTF-8#Description)) +in rare cases) `'ascii'` data, and the valid [`UTF-8` code units](https://en.wikipedia.org/wiki/UTF-8#Codepage_layout), +the replacement character (`�`) is returned, +and no exception will be thrown. + +A `U+FFFD` replacement value +(representing the aforementioned replacement character) will be returned +in case of decoding errors (invalid unicode scalar values). + +```js +// Assuming an invalid byte sequence +const buf = Buffer.from([237, 166, 164]); + +const buf_str = buf.toString('utf-8'); + +console.log(buf_str); +// Prints: '�' + +console.log(buf.byteLength(buf_str)); +// Prints: 3 + +console.log(buf.codePointAt(0).toString(16)); +// Prints: 'fffd' +``` + ## Buffers and TypedArray