Investigate possible performance wins with `TextEncoder#encodeInto` #1313

alexcrichton · 2019-03-05T16:09:02Z

There's some discussion starting here about how we can probably improve the current logic of using encodeInto through some more clever usage and possibly some magic numbers. We should take a look into this! Ideally we'd also take a look at actual performance numbers when doing so.

The text was updated successfully, but these errors were encountered:

hsivonen · 2019-03-08T07:43:47Z

I put a slightly polished restatement of my previous comment onto MDN.

It would be nice to copy and paste the code resulting from fixing this issue into the "Examples" section of the MDN article.

alexcrichton · 2019-03-08T15:35:47Z

Oh thanks for the link @hsivonen!

RReverser · 2019-03-29T01:09:15Z

Note that another alternative that might be worth considering and measuring is calculating UTF-8 byte length upfront.

I've already added usage of Buffer.byteLength(...) in #1391 for Node.js, but for other targets we can do something like:

let size = arg.length;
for (let i = 0; i < arg.length; i++) {
    let code = arg.charCodeAt(i);
    if (code > 0x7f) size++;
    if (code > 0x7ff) size++;
    if (code >= 0xD800 && code <= 0xDBFF) i++; // high surrogate
}

After that, we can allocate the right size right away.

hsivonen · 2019-04-01T08:54:23Z

Note that another alternative that might be worth considering and measuring is calculating UTF-8 byte length upfront.

Gecko used to do this for its internal UTF-16 to UTF-8 conversions, but doing imprecise allocations was better for performance. (With jemalloc, allocations are imprecise anyway. With a precise allocator, it might be more interesting to attempt to do precise allocations.)

RReverser · 2019-04-01T10:24:03Z

With jemalloc, allocations are imprecise anyway.

Does WASM in Rust use jemalloc? AFAIK Rust switched from it to the system allocator by default, but now that I think about it, I'm not sure which allocator is considered "system" and included in the WASM target.

Although maybe most people use wee_alloc instead of the default one? Worth investigating.

alexcrichton · 2019-04-01T10:58:59Z

The wasm target does not use jemalloc, it uses a port of dlmalloc.

RReverser · 2019-04-01T11:54:38Z

Good to know. So, point stands - probably worth optimising for it and/or wee_alloc if we do try to take allocator characteristics into account.

Alternatively, I'd rely on bucket allocators to already round up any allocations to the bucket size and then realloc that stays within the bucket is essentially no-op, so we probably don't need to try and do that manually.

Instead of doubling the size on each iteration, use precise upper limit (3 * JS length) if the string turned out not to be ASCII-only. This results in maximum of 1 reallocation instead of O(log N). Some dummy examples of what this would change: - 1000 of ASCII chars: no change, allocates 1000 bytes and bails out. - 1000 ASCII chars + 1 '😃': before allocated 1000 bytes and reallocated to 2000; now allocates 1000 bytes and reallocates to 1006. - 1000 of '😃' chars: before allocated 1000 bytes, reallocated to 2000, finally reallocated again to 4000; now allocates 1000 bytes and reallocates to 4000 right away. Related issue: rustwasm#1313

alexcrichton · 2019-04-16T17:59:22Z

This was done originally in #1414 and we can always follow up with further improvements if necessary!

alexcrichton mentioned this issue Mar 5, 2019

Add support for TextEncoder#encodeInto #1279

Merged

RReverser mentioned this issue Apr 1, 2019

Optimise encodeInto reallocations #1414

Merged

alexcrichton closed this as completed Apr 16, 2019

RReverser mentioned this issue Apr 17, 2019

Speed up passing ASCII-only strings to WASM #1470

Merged

josephlr mentioned this issue Sep 15, 2020

cli-support: Remove Node.js specific passStringToWasm #2310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate possible performance wins with `TextEncoder#encodeInto` #1313

Investigate possible performance wins with `TextEncoder#encodeInto` #1313

alexcrichton commented Mar 5, 2019

hsivonen commented Mar 8, 2019

alexcrichton commented Mar 8, 2019

RReverser commented Mar 29, 2019

hsivonen commented Apr 1, 2019

RReverser commented Apr 1, 2019

alexcrichton commented Apr 1, 2019

RReverser commented Apr 1, 2019

alexcrichton commented Apr 16, 2019

Investigate possible performance wins with TextEncoder#encodeInto #1313

Investigate possible performance wins with TextEncoder#encodeInto #1313

Comments

alexcrichton commented Mar 5, 2019

hsivonen commented Mar 8, 2019

alexcrichton commented Mar 8, 2019

RReverser commented Mar 29, 2019

hsivonen commented Apr 1, 2019

RReverser commented Apr 1, 2019

alexcrichton commented Apr 1, 2019

RReverser commented Apr 1, 2019

alexcrichton commented Apr 16, 2019

Investigate possible performance wins with `TextEncoder#encodeInto` #1313

Investigate possible performance wins with `TextEncoder#encodeInto` #1313