-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unicode-xid crate instead of libcore #62848
Conversation
/// be an alphabetic character followed by any number of alphanumeric | ||
/// characters. | ||
/// Parses a word starting at the current position. A word is the same as | ||
/// Rust identifier, except that it can't start with `_` character. | ||
fn word(&mut self) -> &'a str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we (accidentally?) don't gate on non-ascii idents here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c5673f4b78f88393d9ba4f46f4838d38
Well, one rebase seems better than a separate PR (especially for code that can be just re-generated). |
Makes sense, added a commit that regenerates table using existing unicode version. However, removing |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
cc #62869 |
☔ The latest upstream changes (presumably #62902) made this pull request unmergeable. Please resolve the merge conflicts. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☔ The latest upstream changes (presumably #62935) made this pull request unmergeable. Please resolve the merge conflicts. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
e0e26cd
to
57f11b8
Compare
This is now blocked on unicode-rs/unicode-xid#11. cc @Manishearth :-) |
@rustbot modify labels to +T-compiler |
@rfcbot fcp merge |
☔ The latest upstream changes (presumably #62990) made this pull request unmergeable. Please resolve the merge conflicts. |
@rfcbot ask @rust-lang/compiler Should we move to using unicode-xid? |
On the call site, `rustc_lexer::is_whitespace` reads much better than `character_properties::is_whitespace`.
@bors r+ |
📌 Commit 206fe8e has been approved by |
⌛ Testing commit 206fe8e with merge b4d47710a7240f0050536d56c19d246a5ae30d5b... |
⌛ Testing commit 206fe8e with merge 1a2a386dae552c1e1ef02586fd24fe7d15c39c61... |
@bors retry prioritizing r0llup since queue is really long. |
⌛ Testing commit 206fe8e with merge 722de75f4e4fd9f8c7844664289725117ea40d1b... |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
@bors retry rolled up. |
⌛ Testing commit 206fe8e with merge aa4e70a7a1f891b5f40f78cf83f936a659ab0572... |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
@bors retry rolled up. |
Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>
Rollup of 11 pull requests Successful merges: - #62848 (Use unicode-xid crate instead of libcore) - #63774 (Fix `window.hashchange is not a function`) - #63930 (Account for doc comments coming from proc macros without spans) - #64003 (place: Passing `align` = `layout.align.abi`, when also passing `layout`) - #64030 (Fix unlock ordering in SGX synchronization primitives) - #64041 (use TokenStream rather than &[TokenTree] for built-in macros) - #64051 (Add x86_64-linux-kernel target) - #64063 (Fix const_err with `-(-0.0)`) - #64083 (Point at appropriate arm on type error on if/else/match with one non-! arm) - #64100 (Fix const eval bug breaking run-pass tests in Miri) - #64157 (Opaque type locations in error message for clarity.) Failed merges: r? @ghost
This PR proposes to remove
char::is_xid_start
andchar::is_xid_continue
functions fromlibcore
and useunicode_xid
crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock).Reasons to do this:
unicode-xid
has almost 10 million downs, as aproc_macro2
dependency)rustc_lexer
crate with rust-analyzer: no need to#[cfg]
if we are building as a part of the compilerReasons not to do this:
./unicode.py
after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, someone needs to maintain it anyway.old description
Followup to #59706
r? @eddyb
Note that this doesn't actually remove tables from libcore, to avoid conflict with #62641.
cc unicode-rs/unicode-xid#11