Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: optimize writing short strings #54310

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions benchmark/buffers/buffer-write-string-short.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
'use strict';

const common = require('../common.js');
const bench = common.createBenchmark(main, {
len: [0, 1, 8, 16, 32],
fallback: [1, 0],
n: [1e6],
});

function main({ len, n, fallback }) {
const buf = Buffer.allocUnsafe(len);
const string = fallback && len > 0
? Buffer.from('a'.repeat(len - 1) + '€').toString()

Check failure on line 13 in benchmark/buffers/buffer-write-string-short.js

View workflow job for this annotation

GitHub Actions / lint-js-and-md

'?' should be placed at the end of the line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the formatting issues?

: Buffer.from('a'.repeat(len)).toString()

Check failure on line 14 in benchmark/buffers/buffer-write-string-short.js

View workflow job for this annotation

GitHub Actions / lint-js-and-md

':' should be placed at the end of the line

Check failure on line 14 in benchmark/buffers/buffer-write-string-short.js

View workflow job for this annotation

GitHub Actions / lint-js-and-md

Missing semicolon
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string);
}
bench.end(n);
}
74 changes: 16 additions & 58 deletions benchmark/buffers/buffer-write-string.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,67 +2,25 @@

const common = require('../common.js');
const bench = common.createBenchmark(main, {
encoding: [
'', 'utf8', 'ascii', 'hex', 'utf16le', 'latin1',
],
args: [ '', 'offset', 'offset+length' ],
len: [2048],
encoding: [ '', 'utf8', 'ascii' ],
len: [0, 1, 8, 16, 32],
n: [1e6],
});

function main({ len, n, encoding, args }) {
let string;
let start = 0;
function main({ len, n, encoding }) {
const buf = Buffer.allocUnsafe(len);

switch (args) {
case 'offset':
string = 'a'.repeat(Math.floor(len / 2));
start = len - string.length;
if (encoding) {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, start, encoding);
}
bench.end(n);
} else {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, start);
}
bench.end(n);
}
break;
case 'offset+length':
string = 'a'.repeat(len);
if (encoding) {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, 0, buf.length, encoding);
}
bench.end(n);
} else {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, 0, buf.length);
}
bench.end(n);
}
break;
default:
string = 'a'.repeat(len);
if (encoding) {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, encoding);
}
bench.end(n);
} else {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string);
}
bench.end(n);
}
const string = 'a'.repeat(len);
if (encoding) {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string, encoding);
}
bench.end(n);
} else {
bench.start();
for (let i = 0; i < n; ++i) {
buf.write(string);
}
bench.end(n);
}
}
27 changes: 24 additions & 3 deletions lib/buffer.js
Original file line number Diff line number Diff line change
Expand Up @@ -1082,10 +1082,10 @@ function _fill(buf, value, offset, end, encoding) {
Buffer.prototype.write = function write(string, offset, length, encoding) {
// Buffer#write(string);
if (offset === undefined) {
return this.utf8Write(string, 0, this.length);
}
offset = 0;
length = this.length;
// Buffer#write(string, encoding)
if (length === undefined && typeof offset === 'string') {
} else if (length === undefined && typeof offset === 'string') {
encoding = offset;
length = this.length;
offset = 0;
Expand All @@ -1108,6 +1108,27 @@ Buffer.prototype.write = function write(string, offset, length, encoding) {
}
}

const len = string?.length;
if (
len <= 16 &&
len <= length &&
(!encoding || encoding === 'ascii' || encoding === 'utf8') &&
typeof string === 'string'
) {
for (let n = 0; true; n++) {
if (n === len) {
return len;
}

const code = StringPrototypeCharCodeAt(string, n);
if (code >= 128) {
break;
Copy link
Member

@lpinca lpinca Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that buffer.write('aaaaa€') now returns 3 instead of 8.

Edit: I think it is just slower now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on?

buffers/buffer-write-string.js n=1000000 len=1 args='' encoding=''         ***    526.57 %      ±36.19% ±51.92% ±76.21%
buffers/buffer-write-string.js n=1000000 len=16 args='' encoding=''        ***     52.39 %      ±18.12% ±25.97% ±38.06%
buffers/buffer-write-string.js n=1000000 len=32 args='' encoding=''                -3.44 %      ±10.55% ±14.63% ±20.33%
buffers/buffer-write-string.js n=1000000 len=8 args='' encoding=''         ***    196.38 %       ±3.09%  ±4.43%  ±6.49%

Copy link
Member

@lpinca lpinca Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the benchmark does not use a string with mixed sigle byte and multibyte characters. I think the worst case would be something like this 'a'.repeat(15) + '€'. In this case the optimization is only overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We're trying to optimize for the most common case.

Copy link
Member

@lpinca lpinca Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an ASCII only string whose length is less than or equal to 16 characters the most common case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression is minimal for non one byte case:

buffers/buffer-write-string-short.js n=1000000 char='' len=0         ***    606.09 %      ±26.40% ±36.19% ±49.34%
buffers/buffer-write-string-short.js n=1000000 char='' len=1                 -0.45 %      ±13.79% ±19.68% ±28.63%
buffers/buffer-write-string-short.js n=1000000 char='' len=16                -0.76 %       ±5.97%  ±8.50% ±12.32%
buffers/buffer-write-string-short.js n=1000000 char='' len=32                 0.51 %      ±11.84% ±16.61% ±23.50%
buffers/buffer-write-string-short.js n=1000000 char='' len=8           *     -8.12 %       ±6.90%  ±9.87% ±14.43%
buffers/buffer-write-string-short.js n=1000000 char='a' len=0         ***    543.24 %      ±12.04% ±17.27% ±25.35%
buffers/buffer-write-string-short.js n=1000000 char='a' len=1         ***    478.53 %      ±31.70% ±45.53% ±66.96%
buffers/buffer-write-string-short.js n=1000000 char='a' len=16        ***    112.38 %       ±6.11%  ±8.72% ±12.69%
buffers/buffer-write-string-short.js n=1000000 char='a' len=32         **     -5.18 %       ±3.21%  ±4.57%  ±6.64%
buffers/buffer-write-string-short.js n=1000000 char='a' len=8         ***    218.82 %      ±11.24% ±15.64% ±21.85%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@ronag ronag Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better comparison:

                                                                  confidence improvement accuracy (*)     (**)    (***)
buffers/buffer-write-string-short.js n=1000000 fallback=0 len=0         ***    555.83 %      ±10.01%  ±14.34%  ±21.00%
buffers/buffer-write-string-short.js n=1000000 fallback=0 len=1         ***    447.19 %     ±126.93% ±182.34% ±268.25%
buffers/buffer-write-string-short.js n=1000000 fallback=0 len=16        ***    118.07 %      ±18.44%  ±26.17%  ±37.74%
buffers/buffer-write-string-short.js n=1000000 fallback=0 len=32                -1.22 %       ±5.77%   ±8.22%  ±11.92%
buffers/buffer-write-string-short.js n=1000000 fallback=0 len=8         ***    192.61 %      ±38.32%  ±54.53%  ±78.97%
buffers/buffer-write-string-short.js n=1000000 fallback=1 len=0         ***    522.42 %     ±106.34% ±152.67% ±224.38%
buffers/buffer-write-string-short.js n=1000000 fallback=1 len=1          **     -4.23 %       ±3.03%   ±4.21%   ±5.85%
buffers/buffer-write-string-short.js n=1000000 fallback=1 len=16        ***    -24.35 %       ±9.58%  ±13.18%  ±18.07%
buffers/buffer-write-string-short.js n=1000000 fallback=1 len=32                -4.96 %      ±12.35%  ±17.52%  ±25.26%
buffers/buffer-write-string-short.js n=1000000 fallback=1 len=8         ***    -22.52 %       ±7.75%  ±11.02%  ±15.95%

@mcollina @anonrig wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make much sense to benchmark length 0 and 1 imho. I'm also not convinced that length <= 16 is a common case, but I have no proof (so far, there is only one comment with less 16 bytes in the PR comments).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it a lot for primary keys that usually are less than 16 bytes. Also for building e.g. json responses require writing lots of small strings.

}

this[offset + n] = code;
ronag marked this conversation as resolved.
Show resolved Hide resolved
}
}

if (!encoding)
return this.utf8Write(string, offset, length);

Expand Down
Loading