Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible performance wins with TextEncoder#encodeInto #1313

Closed
alexcrichton opened this issue Mar 5, 2019 · 8 comments
Closed

Comments

@alexcrichton
Copy link
Contributor

There's some discussion starting here about how we can probably improve the current logic of using encodeInto through some more clever usage and possibly some magic numbers. We should take a look into this! Ideally we'd also take a look at actual performance numbers when doing so.

@hsivonen
Copy link

hsivonen commented Mar 8, 2019

I put a slightly polished restatement of my previous comment onto MDN.

It would be nice to copy and paste the code resulting from fixing this issue into the "Examples" section of the MDN article.

@alexcrichton
Copy link
Contributor Author

Oh thanks for the link @hsivonen!

@RReverser
Copy link
Member

Note that another alternative that might be worth considering and measuring is calculating UTF-8 byte length upfront.

I've already added usage of Buffer.byteLength(...) in #1391 for Node.js, but for other targets we can do something like:

let size = arg.length;
for (let i = 0; i < arg.length; i++) {
    let code = arg.charCodeAt(i);
    if (code > 0x7f) size++;
    if (code > 0x7ff) size++;
    if (code >= 0xD800 && code <= 0xDBFF) i++; // high surrogate
}

After that, we can allocate the right size right away.

@hsivonen
Copy link

hsivonen commented Apr 1, 2019

Note that another alternative that might be worth considering and measuring is calculating UTF-8 byte length upfront.

Gecko used to do this for its internal UTF-16 to UTF-8 conversions, but doing imprecise allocations was better for performance. (With jemalloc, allocations are imprecise anyway. With a precise allocator, it might be more interesting to attempt to do precise allocations.)

@RReverser
Copy link
Member

With jemalloc, allocations are imprecise anyway.

Does WASM in Rust use jemalloc? AFAIK Rust switched from it to the system allocator by default, but now that I think about it, I'm not sure which allocator is considered "system" and included in the WASM target.

Although maybe most people use wee_alloc instead of the default one? Worth investigating.

@alexcrichton
Copy link
Contributor Author

The wasm target does not use jemalloc, it uses a port of dlmalloc.

@RReverser
Copy link
Member

Good to know. So, point stands - probably worth optimising for it and/or wee_alloc if we do try to take allocator characteristics into account.

Alternatively, I'd rely on bucket allocators to already round up any allocations to the bucket size and then realloc that stays within the bucket is essentially no-op, so we probably don't need to try and do that manually.

RReverser added a commit to RReverser/wasm-bindgen that referenced this issue Apr 1, 2019
Instead of doubling the size on each iteration, use precise upper limit (3 * JS length) if the string turned out not to be ASCII-only. This results in maximum of 1 reallocation instead of O(log N).

Some dummy examples of what this would change:
 - 1000 of ASCII chars: no change, allocates 1000 bytes and bails out.
 - 1000 ASCII chars + 1 '😃': before allocated 1000 bytes and reallocated to 2000; now allocates 1000 bytes and reallocates to 1006.
 - 1000 of '😃' chars: before allocated 1000 bytes, reallocated to 2000, finally reallocated again to 4000; now allocates 1000 bytes and reallocates to 4000 right away.

Related issue: rustwasm#1313
@alexcrichton
Copy link
Contributor Author

This was done originally in #1414 and we can always follow up with further improvements if necessary!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants