Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifier syntax to identifiers.md #1583

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions src/identifiers.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Identifiers

r[ident]

r[ident.syntax]
> **<sup>Lexer:<sup>**\
> IDENTIFIER_OR_KEYWORD :\
> &nbsp;&nbsp; &nbsp;&nbsp; XID_Start XID_Continue<sup>\*</sup>\
Expand All @@ -13,6 +16,7 @@
> NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER

<!-- When updating the version, update the UAX links, too. -->
r[ident.unicode]
Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for Unicode version 15.0, with the additions described below. Some examples of identifiers:

* `foo`
Expand All @@ -21,6 +25,7 @@ Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for
* `Москва`
* `東京`

r[ident.profile]
The profile used from UAX #31 is:

* Start := [`XID_Start`], plus the underscore character (U+005F)
Expand All @@ -31,28 +36,47 @@ with the additional constraint that a single underscore character is not an iden

> **Note**: Identifiers starting with an underscore are typically used to indicate an identifier that is intentionally unused, and will silence the unused warning in `rustc`.

r[ident.keyword]
Identifiers may not be a [strict] or [reserved] keyword without the `r#` prefix described below in [raw identifiers](#raw-identifiers).

r[ident.zero-width-chars]
Zero width non-joiner (ZWNJ U+200C) and zero width joiner (ZWJ U+200D) characters are not allowed in identifiers.

r[ident.ascii-limitations]
Identifiers are restricted to the ASCII subset of [`XID_Start`] and [`XID_Continue`] in the following situations:

r[ident.ascii-extern-crate]
* [`extern crate`] declarations

r[ident.ascii-extern-prelude]
* External crate names referenced in a [path]

r[ident.ascii-outlined-module]
* [Module] names loaded from the filesystem without a [`path` attribute]

r[ident.ascii-no_mangle]
* [`no_mangle`] attributed items

r[ident.ascii-extern-item]
* Item names in [external blocks]

## Normalization

r[ident.normalization]

Identifiers are normalized using Normalization Form C (NFC) as defined in [Unicode Standard Annex #15][UAX15]. Two identifiers are equal if their NFC forms are equal.

[Procedural][proc-macro] and [declarative][mbe] macros receive normalized identifiers in their input.

## Raw identifiers

r[ident.raw]

r[ident.raw.intro]
A raw identifier is like a normal identifier, but prefixed by `r#`. (Note that
the `r#` prefix is not included as part of the actual identifier.)

r[ident.raw.allowed]
Unlike a normal identifier, a raw identifier may be any strict or reserved
keyword except the ones listed above for `RAW_IDENTIFIER`.

Expand Down