-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ignoreBOM and fatal to TextDecoder #1730
Conversation
Looks good to me, thanks! Could this also include a comment for why the extra options are specified? (in the Rust source, not the generated JS source) |
Looks great! I think there are some test failures though? |
@alexcrichton It seems to work fine in Chrome, but is failing in Firefox. I'm investigating it. |
Okay, that took me several hours to debug. It's actually a bug in Firefox: const decoder = new TextDecoder('utf-8', { ignoreBOM: true, fatal: true });
const slice = new Uint8Array([239, 187, 191, 98, 97, 114]);
console.log(decoder.decode(slice) === decoder.decode(slice)); The above code returns This has already been reported and fixed (3 days ago), but it is fixed in Firefox 70, so we have to wait for that to be released. |
Nice digging! Want to back out the tests and we can land this anyway for the time being? |
@alexcrichton Done. |
👍 |
Can we invert this PR to remove |
@kdy1 That sounds like a bug in Node. I see no reason why |
Like I said, it sounds like a bug in Node. The Node claims that they follow the WHATWG spec, but it seems that Node is non-compliant when compiled without ICU (for some bizarre unknown reason). I don't think it's a good idea to remove It should really be fixed in Node. If Node is non-compliant on something as simple as |
Let's file an issue to Node.js |
Oh, I see it's already written in the document. |
Yes, but it doesn't explain why. It's likely that they just haven't implemented it. |
Just for context @Pauan, what exactly does the @kdy1 and @magic-akari, if you file an issue with Node it would be helpful if you post it here. |
@daxpedda That's a good question. In theory Rust strings should always be valid UTF-8, however if there is a bug in the wasm-bindgen glue code (or possibly in user code?) then it could end up sending invalid bytes to JS. We've had various encoding bugs in the past, so it's nice to have an extra layer of safety, even if just to give us peace of mind. It's the same reason why people write unit tests, in order to catch any potential bugs in the future, and give guarantees about correct behavior. |
Fixes #1729