Skip to content

Commit

Permalink
use TextEncoder and TextDecoder when available
Browse files Browse the repository at this point in the history
This commit allows the RxPlayer to use the `TextEncoder` and `TextDecoder`
APIs when available respectively to encode JS strings into an UTF-8
bytes sequence (TextEncoder doesn't seem to be able to encode into any
other encoding) and to decode from either UTF-8, UTF-16BE or UTF-16LE
into a JS string.

Because `TextEncoder` and `TextDecoder` are not defined in old browser
versions we claim to support and in IE11, we still fallback to custom
implementation either if it doesn't exist or if the operation fails.

It is important to note of a sensible difference between using
the `TextDecoder` interface and the previous implementation: when
encountering invalid byte sequences in the correponding encoding,
the `TextDecoder` will replace those by a "REPLACEMENT CHARACTER" (�).

This seems fine and even desirable, but the previous implementation just
threw in that same situation.
This means that we now have two different behaviors, depending on the
current platform / browser.

Those functions using the `TextDecoder` APIs are even directly defined
in the `StringUtils` tools, and thus that new behavior can be directly
noticable by applications using it.
Thankfully, nothing is defined in our API documentation about invalid
sequences.

Even if we can consider that this does not break our API (though it is
still unclear to me), it should be is something to keep in mind as this
might be unexpected for users relying on this API throwing.

Also, I tried to add unit tests, but it appears that "jsdom", on which
relies jest to perform unit test while simulation a browser in node,
does not include either APIs yet. Though it is under way:
jsdom/whatwg-encoding#11
  • Loading branch information
peaBerberian committed Jan 28, 2021
1 parent d642869 commit 4abbac7
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions src/utils/string_parsing.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
* limitations under the License.
*/

import log from "../log";
import assert from "./assert";

/**
Expand Down Expand Up @@ -56,6 +57,17 @@ function strToBeUtf16(str: string): Uint8Array {
* @returns {string}
*/
function utf16LEToStr(bytes : Uint8Array) : string {
if (typeof window.TextDecoder === "function") {
try {
// instanciation throws if the encoding is unsupported
const decoder = new TextDecoder("utf-16le");
return decoder.decode(bytes);
} catch (e) {
log.warn("Utils: could not use TextDecoder to parse UTF-16LE, " +
"fallbacking to another implementation", e);
}
}

let str = "";
for (let i = 0; i < bytes.length; i += 2) {
str += String.fromCharCode((bytes[i + 1] << 8) + bytes[i]);
Expand All @@ -69,6 +81,17 @@ function utf16LEToStr(bytes : Uint8Array) : string {
* @returns {string}
*/
function beUtf16ToStr(bytes : Uint8Array) : string {
if (typeof window.TextDecoder === "function") {
try {
// instanciation throws if the encoding is unsupported
const decoder = new TextDecoder("utf-16be");
return decoder.decode(bytes);
} catch (e) {
log.warn("Utils: could not use TextDecoder to parse UTF-16BE, " +
"fallbacking to another implementation", e);
}
}

let str = "";
for (let i = 0; i < bytes.length; i += 2) {
str += String.fromCharCode((bytes[i] << 8) + bytes[i + 1]);
Expand All @@ -83,6 +106,16 @@ function beUtf16ToStr(bytes : Uint8Array) : string {
* @returns {Uint8Array}
*/
function strToUtf8(str : string) : Uint8Array {
if (typeof window.TextEncoder === "function") {
try {
const encoder = new TextEncoder();
return encoder.encode(str);
} catch (e) {
log.warn("Utils: could not use TextEncoder to encode string into UTF-8, " +
"fallbacking to another implementation", e);
}
}

// http://stackoverflow.com/a/13691499 provides an ugly but functional solution.
// (Note you have to dig deeper to understand it but I have more faith in
// stackoverflow not going down in the future so I leave that link.)
Expand Down Expand Up @@ -209,6 +242,17 @@ function intToHex(num : number, size : number) : string {
* @returns {string}
*/
function utf8ToStr(data : Uint8Array) : string {
if (typeof window.TextDecoder === "function") {
try {
// TextDecoder use UTF-8 by default
const decoder = new TextDecoder();
return decoder.decode(data);
} catch (e) {
log.warn("Utils: could not use TextDecoder to parse UTF-8, " +
"fallbacking to another implementation", e);
}
}

let uint8 = data;

// If present, strip off the UTF-8 BOM.
Expand Down

0 comments on commit 4abbac7

Please sign in to comment.