Add TextEncoder.prototype.encodeInto #26904

RReverser · 2019-03-25T14:23:12Z

Is your feature request related to a problem? Please describe.

TextEncoder.prototype.encodeInto is a new method available in latest Chrome and Firefox that allows efficient encoding of strings into a given buffer, avoiding intermediate allocation and copying.

Its primary usecase is a more efficient way to pass strings to WASM (one of the most expensive things one has to do when communicating with JavaScript) and it's already leveraged by Rust wasm-bindgen where available. wasm-bindgen has support for a Node.js target, so it would be great to speed things up here too.

Describe the solution you'd like

Adding native encodeInto method to the TextEncoder class already available in util module.

Please describe the desired behavior.

MDN: https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/encodeInto

Describe alternatives you've considered

Sending PR to wasm-bindgen to wrap memory into a Node-specific Buffer and use Buffer.prototype.write instead.

It has very similar semantics, and produces similar speed-ups (up to 35% on some of my string-interaction-heavy WASM libraries), but I wanted to see if it's possible to solve this on the Node.js side first so that the same method could be used for all platforms.

cc @jasnell as the author of the original implementation of WHATWG Encoding API in Node.js (#13644)

The text was updated successfully, but these errors were encountered:

jasnell · 2019-03-25T15:11:18Z

Should be fairly straightforward to implement. Interested in giving it a try?

RReverser · 2019-03-25T15:16:51Z

Potentially, but currently pretty busy with another higher-level project (in which I found out this perf differentiator between running my WASM in different environments).

If nobody takes this before, can give it a try sometime later.

Do you think it would be reasonable if I put good first issue label on this?

jasnell · 2019-03-25T15:22:01Z

Labeling it good first issue should be fine but it will require a bit deeper knowledge than a beginner might have, and would likely require a bit of C++

RReverser · 2019-03-25T15:23:26Z

Yeah, depends whether that targets beginners in general or just beginners to the project... I think I can put "good first issue" + "C++" to make it more clear.

RReverser · 2019-03-25T17:02:53Z

One difference with Buffer::write I should mention is that encodeInto additionally returns number of string code units written, not just bytes. This is important to allow partial writes. (In theory people using Buffer::write for partial writes would need something like that too, but somehow they got away without it?)

Node.js doesn't currently implement `TextEncoder::encodeInto`. I've raised an upstream issue to add it - nodejs/node#26904 - but it's likely to take some time and will be available only in new releases. In the meanwhile, it's worth noting that Node.js already has `Buffer::write` which has pretty similar semantics, but doesn't require creating an intermediate view using `.subarray` and instead accepts pointer and length directly. Also, Node.js has `Buffer::byteLength` helper which allows to efficiently retrieve an encoded byte length of a string upfront, and so allows us to avoid a loop with reallocations. This change takes leverage of these methods by generating an additional Buffer-based view into the WASM memory and using it for string operations. I'm seeing up to 35% increase in performance in string-heavy library benchmarks.

JungMinu · 2019-03-25T23:45:06Z

going to take care of this

zandaqo · 2019-07-05T08:06:16Z

This would be really helpful for anyone dealing with binary. As much as I like Buffer, it's not an option for isomorphic code. Is there any ETA on this one yet?

AtticusYang · 2019-07-23T02:40:29Z

@RReverser ,I tried to implement this task and I did it. but I am confused by the test case in
https://github.com/nodejs/node/blob/master/test/fixtures/wpt/encoding/encodeInto.any.js,
when input string is "input": "\u{1D306}", the expected read is 2, but the actual is 4, i also did a simple test as followed, the byte length of "\u{1D306}" is 4.

let a = "\u{1D306}";
console.log(a);
console.log(Buffer.byteLength(a, 'utf8'));

and the dest is a uint8array, the byteOffset is 0 and the length is 4.

so I wonder the test case is right or wrong, thanks!

zandaqo · 2019-07-23T06:57:53Z

@AtticusYang TextEncoder#encodeInto returns the number of code points not the byte length--"\u{1D306}" is encoded with two code points in UTF-16, and indeed four bytes in UTF-8.

AtticusYang · 2019-07-23T07:49:57Z

@zandaqo thank for your advice.

Add TextEncoder.prototype.encodeInto. Refs: nodejs#26904

Add function encodeInto to TextEncoder, fix bug MessageChannel is not defined in encodeInto.any.js. Fixes: nodejs#28851 Refs: nodejs#26904

Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: nodejs#28851 Fixes: nodejs#26904 Refs: nodejs#28862 Co-authored-by: AtticusYang <yyongtai@163.com>

Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: #28851 Fixes: #26904 Refs: #28862 Co-authored-by: AtticusYang <yyongtai@163.com> PR-URL: #29524 Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>

zandaqo · 2019-09-23T17:38:10Z

I was playing with the latest nightly build (Sep 23) and noticed an odd behavior when non-string is used as a source:

const encoder = new TextEncoder();
const destination = new Uint8Array(10);

encoder.encodeInto(null, destination);
//=> TypeError: The "src" argument must be of type string. Received type object

While the spec does say that the source should be a USVString, its current implementations in both Chrome and Firefox opt for converting the source into a string first instead of throwing an error. So, are we sticking with the current implementation, that is, Node.js going to differ from browsers and throw if a non-string source is provided?

Also, this throwing makes less sense considering that we are ok with non-string values in TextEncoder#encode:

const encoder = new TextEncoder();
encoder.encode(null)
//=> Uint8Array [ 110, 117, 108, 108 ]

Add function encodeInto to TextEncoder, and add MessageChannel to the encodeInto.any.js test. Fixes: #28851 Fixes: #26904 Refs: #28862 Co-authored-by: AtticusYang <yyongtai@163.com> PR-URL: #29524 Reviewed-By: David Carlier <devnexen@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>

RReverser added the buffer Issues and PRs related to the buffer subsystem. label Mar 25, 2019

devsnek added encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. feature request Issues that request new features to be added to Node.js. labels Mar 25, 2019

RReverser added c++ Issues and PRs that require attention from people who are familiar with C++. good first issue Issues that are suitable for first-time contributors. labels Mar 25, 2019

RReverser mentioned this issue Mar 25, 2019

Add specialised passStringToWasm for Node.js rustwasm/wasm-bindgen#1391

Merged

JungMinu self-assigned this Mar 25, 2019

AtticusYang mentioned this issue Jul 25, 2019

MessageChannel is not defined in encodeInto.any.js #28851

Closed

AtticusYang pushed a commit to AtticusYang/node that referenced this issue Jul 25, 2019

encoding: add encodeInto to TextEncoder

092af43

Add TextEncoder.prototype.encodeInto. Refs: nodejs#26904

AtticusYang pushed a commit to AtticusYang/node that referenced this issue Jul 26, 2019

encoding: add encodeInto to TextEncoder

9b7fe9f

Add function encodeInto to TextEncoder, fix bug MessageChannel is not defined in encodeInto.any.js. Fixes: nodejs#28851 Refs: nodejs#26904

AtticusYang mentioned this issue Jul 26, 2019

src: add encodeInto to TextEncoder #28862

Closed

4 tasks

AtticusYang pushed a commit to AtticusYang/node that referenced this issue Sep 4, 2019

src: add encodeInto to TextEncoder

91d61fb

Add function encodeInto to TextEncoder, fix bug MessageChannel is not defined in encodeInto.any.js. Fixes: nodejs#28851 Refs: nodejs#26904

addaleax removed the good first issue Issues that are suitable for first-time contributors. label Sep 10, 2019

addaleax mentioned this issue Sep 10, 2019

util: add encodeInto to TextEncoder #29524

Closed

4 tasks

addaleax closed this as completed in 7c9ee6d Sep 13, 2019

zandaqo mentioned this issue Sep 25, 2019

v12.11.0 proposal #29695

Merged

jmatsushita mentioned this issue Jan 18, 2024

Support TextEncoder.encodeInto expo/browser-polyfill#80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TextEncoder.prototype.encodeInto #26904

Add TextEncoder.prototype.encodeInto #26904

RReverser commented Mar 25, 2019

jasnell commented Mar 25, 2019

RReverser commented Mar 25, 2019 •

edited

Loading

jasnell commented Mar 25, 2019

RReverser commented Mar 25, 2019 •

edited

Loading

RReverser commented Mar 25, 2019

JungMinu commented Mar 25, 2019

zandaqo commented Jul 5, 2019

AtticusYang commented Jul 23, 2019 •

edited

Loading

zandaqo commented Jul 23, 2019

AtticusYang commented Jul 23, 2019

zandaqo commented Sep 23, 2019 •

edited

Loading

Add TextEncoder.prototype.encodeInto #26904

Add TextEncoder.prototype.encodeInto #26904

Comments

RReverser commented Mar 25, 2019

jasnell commented Mar 25, 2019

RReverser commented Mar 25, 2019 • edited Loading

jasnell commented Mar 25, 2019

RReverser commented Mar 25, 2019 • edited Loading

RReverser commented Mar 25, 2019

JungMinu commented Mar 25, 2019

zandaqo commented Jul 5, 2019

AtticusYang commented Jul 23, 2019 • edited Loading

zandaqo commented Jul 23, 2019

AtticusYang commented Jul 23, 2019

zandaqo commented Sep 23, 2019 • edited Loading

RReverser commented Mar 25, 2019 •

edited

Loading

RReverser commented Mar 25, 2019 •

edited

Loading

AtticusYang commented Jul 23, 2019 •

edited

Loading

zandaqo commented Sep 23, 2019 •

edited

Loading