Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ignoreBOM and fatal to TextDecoder #1730

Merged
merged 6 commits into from
Aug 23, 2019

Conversation

Pauan
Copy link
Contributor

@Pauan Pauan commented Aug 21, 2019

Fixes #1729

@alexcrichton
Copy link
Contributor

Looks good to me, thanks!

Could this also include a comment for why the extra options are specified? (in the Rust source, not the generated JS source)

@alexcrichton
Copy link
Contributor

Looks great! I think there are some test failures though?

@Pauan
Copy link
Contributor Author

Pauan commented Aug 22, 2019

@alexcrichton It seems to work fine in Chrome, but is failing in Firefox. I'm investigating it.

@Pauan
Copy link
Contributor Author

Pauan commented Aug 22, 2019

Okay, that took me several hours to debug. It's actually a bug in Firefox:

const decoder = new TextDecoder('utf-8', { ignoreBOM: true, fatal: true });
const slice = new Uint8Array([239, 187, 191, 98, 97, 114]);
console.log(decoder.decode(slice) === decoder.decode(slice));

The above code returns true in Chrome but false in Firefox. Basically, if you use the same TextDecoder multiple times, it ignores the ignoreBOM setting.

This has already been reported and fixed (3 days ago), but it is fixed in Firefox 70, so we have to wait for that to be released.

@alexcrichton
Copy link
Contributor

Nice digging! Want to back out the tests and we can land this anyway for the time being?

@Pauan
Copy link
Contributor Author

Pauan commented Aug 22, 2019

@alexcrichton Done.

@alexcrichton alexcrichton merged commit fb0bbc0 into rustwasm:master Aug 23, 2019
@alexcrichton
Copy link
Contributor

👍

@kdy1
Copy link

kdy1 commented Jul 17, 2024

Can we invert this PR to remove fatal?
We are trying to use wasm-bindgen and fatal option does not work if node.js is built without ICU

@Pauan
Copy link
Contributor Author

Pauan commented Jul 17, 2024

@kdy1 That sounds like a bug in Node. I see no reason why fatal should stop working if you disable ICU.

@kdy1
Copy link

kdy1 commented Jul 17, 2024

@Pauan
Copy link
Contributor Author

Pauan commented Jul 17, 2024

Like I said, it sounds like a bug in Node. The fatal option is standardized and required by WHATWG.

Node claims that they follow the WHATWG spec, but it seems that Node is non-compliant when compiled without ICU (for some bizarre unknown reason).

I don't think it's a good idea to remove fatal. A possible solution is to add in runtime checks to see if fatal is supported or not... but that bloats up the code quite a bit, so it's better not to do that if we can avoid it.

It should really be fixed in Node. If Node is non-compliant on something as simple as fatal, then they're likely non-compliant in other much bigger ways.

@magic-akari
Copy link
Contributor

Let's file an issue to Node.js

@magic-akari
Copy link
Contributor

fatal true if decoding failures are fatal. This option is not supported when ICU is disabled (see Internationalization). Default: false.

Oh, I see it's already written in the document.

@Pauan
Copy link
Contributor Author

Pauan commented Jul 17, 2024

Oh, I see it's already written in the document.

Yes, but it doesn't explain why. It's likely that they just haven't implemented it.

@daxpedda
Copy link
Collaborator

Just for context @Pauan, what exactly does the fatal option get us?
AFAIU its only useful if we send invalid UTF-8, which would require unsafe.
In the comment you mentioned "weird encoding bugs", what did you mean by that?

@kdy1 and @magic-akari, if you file an issue with Node it would be helpful if you post it here.
I would be willing to add e.g. an environment variable that disable the fatal option if Node gives a satisfying reason (which I kinda doubt?).

@Pauan
Copy link
Contributor Author

Pauan commented Jul 29, 2024

@daxpedda That's a good question. In theory Rust strings should always be valid UTF-8, however if there is a bug in the wasm-bindgen glue code (or possibly in user code?) then it could end up sending invalid bytes to JS.

We've had various encoding bugs in the past, so it's nice to have an extra layer of safety, even if just to give us peace of mind.

It's the same reason why people write unit tests, in order to catch any potential bugs in the future, and give guarantees about correct behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing a String to JS removes BOM (FEFF)
5 participants