`Writable` doesn't correctly count size of strings #52818

ronag · 2024-05-03T11:34:05Z

Minor bug. Not sure if it's worth the performance impact of fixing.

When calculating the currently buffered length we are not correctly calculating the byte length of strings, we just use the string length.

function writeOrBuffer(stream, state, chunk, encoding, callback) {
  const len = (state[kState] & kObjectMode) !== 0 ? 1 : chunk.length;

  state.length += len;

ronag · 2024-05-03T11:34:15Z

@nodejs/streams @mcollina

lpinca · 2024-05-03T14:39:22Z

I think we should fix it. Is this a recent regression?

benjamingr · 2024-05-03T14:57:05Z

So the fix would be to call Buffer.byteLength? Might be worth checking what the perf regression is and if it's not significant to fix it.

benjamingr · 2024-05-03T14:59:13Z

Is this a recent regression?
Not at all, here it is in v10 https://github.com/nodejs/node/blob/v10.x/lib/_stream_readable.js#L291 and at v0.10 https://github.com/nodejs/node/blob/v0.10/lib/_stream_readable.js#L157

lpinca · 2024-05-03T15:26:56Z

@benjamingr it seems to be correctly calculated in v10 and v0.10. See https://github.com/nodejs/node/blob/v10.x/lib/_stream_writable.js#L374 and https://github.com/nodejs/node/blob/v0.10/lib/_stream_writable.js#L204 (note decodeChunk()).

lpinca · 2024-05-03T15:29:26Z

It actually seems to be correctly calculated also on main. See

node/lib/internal/streams/writable.js

Line 465 in 2c55652

chunk = Buffer.from(chunk, encoding);

.

lpinca · 2024-05-03T15:33:55Z

Confirmed.

const { Writable } = require('stream');

const w = new Writable({
  write() {}
});

w.write('€');
w.write('€');

console.log(w.writableLength); // 6

I think we can close this.

benjamingr · 2024-05-03T16:19:04Z

@lpinca that's just one case though?

benjamingr · 2024-05-03T16:20:25Z

To be clear:

When calculating the currently buffered length we are not correctly calculating the byte length of strings, we just use the string length.

This is across streams in several places - the fact writable works with decodeStrings set to true doesn't mean it's not a bug elsewhere?

lpinca · 2024-05-03T17:33:28Z

The issue description did not mention the decodeStrings option. Yes, in that case the string size is incorrectly calculated.

lpinca · 2024-05-03T18:03:08Z

I think using Buffer.byteLength() when the decodeStrings is set to false is acceptable. Performance should be no worse than the default case (decodeStrings set to true) when the string is converted to a Buffer.

benjamingr · 2024-05-03T18:10:54Z

I think whenever we have a stream of strings and not buffers we should use byteLength (or just use Buffer.byteLength for everything) and if there is no performance impact land it.

(though byteLength would use its length as utf-8 and not utf-16 "as if it was a buffer encoded as utf-8" which may be desired?)

Use the UTF-8 byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

Use the byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units. Fixes: #52818

Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units. Fixes: #52818 PR-URL: #52842 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units. Fixes: nodejs#52818 PR-URL: nodejs#52842 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Robert Nagy <ronagy@icloud.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

VoltrexKeyva added the stream Issues and PRs related to the stream subsystem. label May 3, 2024

mcollina closed this as completed May 3, 2024

ronag reopened this May 3, 2024

lpinca mentioned this issue May 4, 2024

stream: use Buffer.byteLength() to get the string length #52828

Closed

lpinca added a commit to lpinca/node that referenced this issue May 4, 2024

stream: use Buffer.ByteLength() to get the string length

5e59e20

Use the UTF-8 byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

lpinca added a commit to lpinca/node that referenced this issue May 4, 2024

stream: use Buffer.byteLength() to get the string length

38abab3

Use the UTF-8 byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

lpinca added a commit to lpinca/node that referenced this issue May 4, 2024

stream: use Buffer.byteLength() to get the string length

213539e

Use the byte length of the string when the `decodeStrings` option is set to `false`. Fixes: nodejs#52818

benjamingr added a commit that referenced this issue May 5, 2024

doc: document watermark string behavior

a5c8454

Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units. Fixes: #52818

benjamingr mentioned this issue May 5, 2024

doc: document watermark string behavior #52842

Merged

benjamingr added a commit that referenced this issue May 6, 2024

doc: watermark string behavior

3e96ac3

Documents that we calculate the highWaterMark value of streams operating on strings using the number of UTF-16 code units. Fixes: #52818

aduh95 closed this as completed in #52842 May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Writable` doesn't correctly count size of strings #52818

`Writable` doesn't correctly count size of strings #52818

ronag commented May 3, 2024

ronag commented May 3, 2024

lpinca commented May 3, 2024

benjamingr commented May 3, 2024

benjamingr commented May 3, 2024

lpinca commented May 3, 2024

lpinca commented May 3, 2024 •

edited

Loading

lpinca commented May 3, 2024

benjamingr commented May 3, 2024

benjamingr commented May 3, 2024

lpinca commented May 3, 2024

lpinca commented May 3, 2024 •

edited

Loading

benjamingr commented May 3, 2024 •

edited

Loading

Writable doesn't correctly count size of strings #52818

Writable doesn't correctly count size of strings #52818

Comments

ronag commented May 3, 2024

ronag commented May 3, 2024

lpinca commented May 3, 2024

benjamingr commented May 3, 2024

benjamingr commented May 3, 2024

lpinca commented May 3, 2024

lpinca commented May 3, 2024 • edited Loading

lpinca commented May 3, 2024

benjamingr commented May 3, 2024

benjamingr commented May 3, 2024

lpinca commented May 3, 2024

lpinca commented May 3, 2024 • edited Loading

benjamingr commented May 3, 2024 • edited Loading

`Writable` doesn't correctly count size of strings #52818

`Writable` doesn't correctly count size of strings #52818

lpinca commented May 3, 2024 •

edited

Loading

lpinca commented May 3, 2024 •

edited

Loading

benjamingr commented May 3, 2024 •

edited

Loading