feat: Expose TextEncoder and TextDecoder #11

fguitton · 2020-03-29T13:54:36Z

This PR has for vocation to address jsdom/jsdom#2524 by complementing this package as suggested in jsdom/jsdom#2928.

This PR transforms the repo to adopt a structure similar to whatwg-url.

TimothyGu · 2020-03-29T15:57:11Z

Hi, thanks for starting this. As a start, I'd say don't bother with some of the changes, like those to the license (LICENSE.txt), ESLint (.eslint*), Travis CI (.travis.yml), and Git configuration files (.git*) unless you have to (e.g., I realize you will have to add lib/TextEncorder.js to the .gitignore file, but you probably shouldn't change /node_modules/ to node_modules/ as it's outside the scope of this PR). You could remove the live viewer for now (live-viewer/) – if you'd like to make one for this package you could always do so after TextEncoder/Decoder gets implemented!

Additionally, I'd like you to remove the package-lock.json. We at jsdom mostly use Yarn over npm (except for whatwg-url), and as you can see we already have a yarn.lock file in this repo. Let's keep that the case.

src/TextDecoder.webidl

src/TextEncoder.webidl

lib/TextDecoder-impl.js

fguitton · 2020-03-29T17:27:46Z

I have started an beginning implementation utilising whatwg-encoding.js, please let me know if I'm on the right track.

Regarding your original comment, as far as I can see package-lock.json, is indeed present in this package.
https://github.com/jsdom/whatwg-encoding/blob/master/package-lock.json
I'm happy to change and use Yarm instead, but this seem to be slightly out of scope.

On that same line, I have converted tests using new Buffer() to using Buffer.from(), this also feels a bit out of scope, should I revert ?

TimothyGu

I have started an beginning implementation utilising whatwg-encoding.js, please let me know if I'm on the right track.

It's looking really good! I've left some comments in any case.

Regarding your original comment, as far as I can see package-lock.json, is indeed present in this package.
https://github.com/jsdom/whatwg-encoding/blob/master/package-lock.json
I'm happy to change and use Yarm instead, but this seem to be slightly out of scope.

Heh, interesting. In that case it should be fine then :) sorry for the false alarm.

On that same line, I have converted tests using new Buffer() to using Buffer.from(), this also feels a bit out of scope, should I revert ?

You could do that, but since it's in its own commit and relatively self-contained, leaving it in would work for me.

lib/TextDecoder-impl.js

fguitton · 2020-03-30T14:08:52Z

@TimothyGu I have been able to look at this in a bit more detail.

There should now be support for streams as well. I have had to stop relying on the decode function defined in whatwg-encoding.js because I could not see how to cleanly utilize it.

There seems to be limitations to bringing a compliant version up. Of those, here are some of the things right now :

It turns out a lot of testing from https://github.com/web-platform-tests/wpt/tree/master/encoding isn't usable as such as it relies on testing functions not present in this repo's (or whatwg-url's) current version of the testharness.js. Those new scripts could be copied in place, what do you suggest ?
The iconv-lite package doesn't support fatal-like functionality. While we could scan the output of the decoding for [0xef, 0xbf, 0xbd], that exact sequence could have been validly input in and the strategy to deal with that is unclear. What are your thoughts ?
There seems to be multiple problems in tests due to iconv-lite not supporting a set of encodings defined in https://encoding.spec.whatwg.org/encodings.json. That is the case for x-mac-cyrillic for example.
iconv-lite seems to fail some of the conversions. For example iconvLite.decode(new Uint8Array([0x00, 0xd8]), 'utf-16le') ought to produce the same as new TextDecoder('utf-16le').decode(new Uint8Array([0x00, 0xd8])) but doesn't. Any pointers ?

TimothyGu · 2020-04-11T16:42:43Z

Thanks for doing the additional investigation!

There should now be support for streams as well. I have had to stop relying on the decode function defined in whatwg-encoding.js because I could not see how to cleanly utilize it.

That's fine. It doesn't use decode in the spec either.

There seems to be limitations to bringing a compliant version up. Of those, here are some of the things right now :

It turns out a lot of testing from https://github.com/web-platform-tests/wpt/tree/master/encoding isn't usable as such as it relies on testing functions not present in this repo's (or whatwg-url's) current version of the testharness.js. Those new scripts could be copied in place, what do you suggest ?

I think we can copy some of those here.

The iconv-lite package doesn't support fatal-like functionality. While we could scan the output of the decoding for [0xef, 0xbf, 0xbd], that exact sequence could have been validly input in and the strategy to deal with that is unclear. What are your thoughts?

Let's just say no fatal support for now, if that's what's needed to get something working as soon as possible.

There seems to be multiple problems in tests due to iconv-lite not supporting a set of encodings defined in https://encoding.spec.whatwg.org/encodings.json. That is the case for x-mac-cyrillic for example.

iconv-lite seems to fail some of the conversions. For example iconvLite.decode(new Uint8Array([0x00, 0xd8]), 'utf-16le') ought to produce the same as new TextDecoder('utf-16le').decode(new Uint8Array([0x00, 0xd8])) but doesn't. Any pointers ?

It looks like iconv-lite doesn't fully implement the WHATWG Encoding Standard. Too bad I suppose. In the long run we should either implement our own algorithms, or change iconv-lite's conversions to conform to the Encoding Standard, or add additional codecs in iconv-lite that do conform to Encoding. For now, having any functionality is still better than not, I'd say, as long as we document these known discrepancies in the README.

For the UTF-16 issue, as uncomfortable as it is for me to say this, I think we could use Node.js' built-in TextDecoder. TextDecoder has always supported UTF-16, even on older Node.js versions when small-icu was the default, unlike other encodings. You might find the Node.js internal UTF-8 decoder to be more compliant too, in case you find discrepancies in iconv-lite.

chyzwar · 2020-08-13T20:59:20Z

Why jsdom cannot just expose TextEncoder and TextDecoder from node.js ? Are these do not conform spec ?

This commit allows the RxPlayer to use the `TextEncoder` and `TextDecoder` APIs when available respectively to encode JS strings into an UTF-8 bytes sequence (TextEncoder doesn't seem to be able to encode into any other encoding) and to decode from either UTF-8, UTF-16BE or UTF-16LE into a JS string. Because `TextEncoder` and `TextDecoder` are not defined in old browser versions we claim to support and in IE11, we still fallback to custom implementation either if it doesn't exist or if the operation fails. It is important to note of a sensible difference between using the `TextDecoder` interface and the previous implementation: when encountering invalid byte sequences in the correponding encoding, the `TextDecoder` will replace those by a "REPLACEMENT CHARACTER" (�). This seems fine and even desirable, but the previous implementation just threw in that same situation. This means that we now have two different behaviors, depending on the current platform / browser. Those functions using the `TextDecoder` APIs are even directly defined in the `StringUtils` tools, and thus that new behavior can be directly noticable by applications using it. Thankfully, nothing is defined in our API documentation about invalid sequences. Even if we can consider that this does not break our API (though it is still unclear to me), it should be is something to keep in mind as this might be unexpected for users relying on this API throwing. Also, I tried to add unit tests, but it appears that "jsdom", on which relies jest to perform unit test while simulation a browser in node, does not include either APIs yet. Though it is under way: jsdom/whatwg-encoding#11

stbrody · 2021-04-06T20:55:43Z

Hey, I'm wondering what the status of this PR is? It looks like it's been sitting idle for almost a year now. The lack of support for TextEncoder and TextDecoder within jsdom, which is heavily utilized by jest, is causing a real headache for our project right now. We'll have to jump through a lot of hoops to work around this in all of our various repos, but it would be much nicer if this was just fixed upstream in jsdom

fguitton · 2021-08-18T18:24:16Z

@TimothyGu I was having a look at this again today.
I have aligned the platform tests to the latest of WPT and updated relevant dependencies.

Beyond the elements I mentioned in #11 (comment) I have come to realize that the testing harness instrumentation from WPT simply cannot work here without a serious rewrite. Following your suggestions on #11 (comment) I can see there are far to many APIs they rely on, such as XMLHttpRequest, that are not implemented in Node and even after trying out a couple of high level polyfills the likes of node-XMLHttpRequest I wasn't able to get more than an extra dozen test to pass.

As such, this PR, with all its limitations, at the very least might allow people with less complex requirements to have an easier life. I am happy to write a disclaimer to accompany the current code should an initial "substandard" release be done. (At this point I'd argue for a not-so-elegant release rather than no release at all)

A brave soul could look at improving the test harness and support for more complex use cases later on 🧠.

…issuecomment-605658333

…ents

ThisIsMissEm · 2022-10-11T18:39:53Z

@TimothyGu @fguitton are there further changes needed on this branch, or would it in its current state be a good step forwards?

domenic · 2022-10-12T01:17:07Z

As discussed on Twitter, what is needed before this is merged is ensuring all tests pass. Currently most are commented out, and none of the ones in subdirectories of encoding/ are even downloaded.

hawk-ahmed-matar · 2022-11-10T13:51:19Z

Any updates?

ThisIsMissEm · 2022-11-10T15:42:39Z

@domenic yeah, I've not had a chance to work on this, been snowed under with other things.

fguitton mentioned this pull request Mar 29, 2020

feat: Add TextEncoder and TextDecoder jsdom/jsdom#2928

Closed

TimothyGu reviewed Mar 29, 2020

View reviewed changes

src/TextDecoder.webidl Outdated Show resolved Hide resolved

src/TextEncoder.webidl Outdated Show resolved Hide resolved

lib/TextDecoder-impl.js Outdated Show resolved Hide resolved

fguitton force-pushed the chore/whatwg-url-model branch from b24f788 to 17770b7 Compare March 29, 2020 17:22

TimothyGu reviewed Mar 29, 2020

View reviewed changes

fguitton force-pushed the chore/whatwg-url-model branch 3 times, most recently from 3e93928 to e0bf542 Compare March 30, 2020 14:06

fguitton requested a review from TimothyGu April 11, 2020 10:27

SimenB mentioned this pull request May 10, 2020

TextEncoder issues on 26.0.1 jestjs/jest#9983

Closed

peaBerberian mentioned this pull request Jan 11, 2021

use TextEncoder and TextDecoder when available canalplus/rx-player#875

Merged

achingbrain mentioned this pull request Apr 6, 2021

Recent release broke our build achingbrain/uint8arrays#17

Closed

domenic mentioned this pull request Sep 12, 2021

Attempt to use the builtin and switch to Uint8Array #13

Draft

fguitton force-pushed the chore/whatwg-url-model branch from 7fb9a1c to 4e07f9a Compare September 21, 2022 10:46

fguitton added 7 commits September 21, 2022 11:48

Early convertion to the whatwg-url repo structure

132f7e9

Transitioning to a whatwg-url model

54a5ee8

Forcing web-platform-tests folder persistance

ead3699

Cleaning files commited by mistake

5c5f4a8

Applying missing change in platform test getter

aac1d93

Fixing typos of .gitignore file

c446195

Updating live-viewer

7de2149

fguitton added 7 commits September 21, 2022 11:49

Reveting changes to configuration files as per jsdom/whatwg-encoding#…

3bd70ba

…issuecomment-605658333

Removing dependency on text-encoding package

e1cb445

Adding tests (some commented for now) and fixing typos following comm…

f22bd27

…ents

Checking for buffer in inputs prior to decoding

e2a3845

Adding support for stream based on iconv-lite

2acdea5

chore: Updating dependencies and lock file to calm down npm audit

8eaef38

chore: Updating the WPT based to latest fully passing

f9bc56f

fguitton force-pushed the chore/whatwg-url-model branch 2 times, most recently from 16dfc21 to 80596b7 Compare September 21, 2022 10:52

chore: Updating dependencies to latest working versions

3f99fac

fguitton force-pushed the chore/whatwg-url-model branch from 80596b7 to 3f99fac Compare September 21, 2022 10:53

stefnotch mentioned this pull request Feb 3, 2023

Drop web-encoding dependency mswjs/interceptors#332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Expose TextEncoder and TextDecoder #11

feat: Expose TextEncoder and TextDecoder #11

fguitton commented Mar 29, 2020

TimothyGu commented Mar 29, 2020

fguitton commented Mar 29, 2020

TimothyGu left a comment

fguitton commented Mar 30, 2020 •

edited

Loading

TimothyGu commented Apr 11, 2020

chyzwar commented Aug 13, 2020

stbrody commented Apr 6, 2021

fguitton commented Aug 18, 2021 •

edited

Loading

ThisIsMissEm commented Oct 11, 2022

domenic commented Oct 12, 2022

hawk-ahmed-matar commented Nov 10, 2022

ThisIsMissEm commented Nov 10, 2022

feat: Expose TextEncoder and TextDecoder #11

Are you sure you want to change the base?

feat: Expose TextEncoder and TextDecoder #11

Conversation

fguitton commented Mar 29, 2020

TimothyGu commented Mar 29, 2020

fguitton commented Mar 29, 2020

TimothyGu left a comment

Choose a reason for hiding this comment

fguitton commented Mar 30, 2020 • edited Loading

TimothyGu commented Apr 11, 2020

chyzwar commented Aug 13, 2020

stbrody commented Apr 6, 2021

fguitton commented Aug 18, 2021 • edited Loading

ThisIsMissEm commented Oct 11, 2022

domenic commented Oct 12, 2022

hawk-ahmed-matar commented Nov 10, 2022

ThisIsMissEm commented Nov 10, 2022

fguitton commented Mar 30, 2020 •

edited

Loading

fguitton commented Aug 18, 2021 •

edited

Loading