Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest character encoding is incorrect when encountering random null bytes #81856

Merged
merged 4 commits into from
Feb 28, 2021

Conversation

syvb
Copy link
Contributor

@syvb syvb commented Feb 7, 2021

This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a BOM (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in #73979 (comment).

Closes #73979.

@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 7, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2021
@matthewjasper
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Feb 27, 2021

📌 Commit ed8c686 has been approved by matthewjasper

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 27, 2021
Suggest character encoding is incorrect when encountering random null bytes

This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in rust-lang#73979 (comment).

Closes rust-lang#73979.
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 28, 2021
Rollup of 11 pull requests

Successful merges:

 - rust-lang#81856 (Suggest character encoding is incorrect when encountering random null bytes)
 - rust-lang#82395 (Add missing "see its documentation for more" stdio)
 - rust-lang#82401 (Remove a redundant macro)
 - rust-lang#82498 (Use log level to control partitioning debug output)
 - rust-lang#82534 (Link crtbegin/crtend on musl to terminate .eh_frame)
 - rust-lang#82537 (Update measureme dependency to the latest version)
 - rust-lang#82561 (doc: cube root, not cubic root)
 - rust-lang#82563 (Fix intra-doc handling of `Self` in enum)
 - rust-lang#82584 (Add ARIA role to sidebar toggle in Rustdoc)
 - rust-lang#82596 (clarify RW lock's priority gotcha)
 - rust-lang#82607 (Add a getter for Frame.loc)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit be3d1eb into rust-lang:master Feb 28, 2021
@bors
Copy link
Contributor

bors commented Feb 28, 2021

⌛ Testing commit ed8c686 with merge 130b2ab...

@rustbot rustbot added this to the 1.52.0 milestone Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhelpful Error Messages When Trying to Compile UTF16 Files
8 participants