Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document syntax reserved in Rust 2021 #1128

Merged
merged 5 commits into from
Jan 6, 2022

Conversation

mattheww
Copy link
Contributor

@mattheww mattheww commented Jan 3, 2022

Closes #1069.

I believe this is a correct description of what was implemented in rust-lang/rust#84599 .

I haven't tried to make use of the Reference-level explanation section from RFC3101, which I think doesn't capture what was implemented (for example, it doesn't indicate that continue'label is now rejected).

This PR leaves reserved prefixes undocumented in the Lexer rules blocks, but I hope that's acceptable for now (suffixes for string literals are already missing there).

Make it say that a suffix of the same form as a keyword is permitted.

Make it clearer that the "Reserved prefixes" rule doesn't apply to suffixes.
@ehuss
Copy link
Contributor

ehuss commented Jan 4, 2022

Thanks, this looks good!

I'm not sure I understand your comment about the Lexer rule block. What is in the RFC looks like it could be copied here. Can you clarify why that wouldn't be included?

I think string literal missing suffixes is just a bug.

Also, BTW, when opening a PR to resolve an issue, can you use one of the GitHub keywords to link the issue? Something like Closes #1069 in the description will link the issue, and GitHub will automatically close it when the PR is merged.

@mattheww
Copy link
Contributor Author

mattheww commented Jan 4, 2022

The main thing I don't like about those lexer rules is that they say that (for example) x#foo is a single token, but that's never true:

  • before Rust 2021 it's two tokens
  • from Rust 2021 on it's an error

So if we adopt those rules I think we'd have to say that they exist only from Rust 2021 on.

Then we'd be relying on separate text along the lines of "any instance of the above produces a tokenization error" to complete the explanation, and I think it's not clear that we're gaining anything by having the rules.

As written the rules are also missing the case where ' appears as a lifetime or label rather than as part of a literal. I'm not sure whether they're missing other cases.

(And I now see that _ isn't treated as a keyword in the reference, so we'd need to tweak the places where they say IDENTIFIER_OR_KEYWORD.)

Also, I'm not comfortable with the parts that are like « Except b, r, br ». I think this is attempting to make the lexer rules unambiguous without needing a priority rule, but I believe to make that work we'd need to also modify the rule for IDENTIFIER_OR_KEYWORD, using some kind of lookahead notation. This doesn't seem to be the way the reference is currently written (it isn't trying to disambiguate the prefixes for string literals).

In the long run I think the lexer description is going to be better off using some kind of explicit priority rules instead.

@ehuss
Copy link
Contributor

ehuss commented Jan 5, 2022

So if we adopt those rules I think we'd have to say that they exist only from Rust 2021 on.

For edition-differences in the lexer, the reference uses the form > **<sup>Lexer 2018+</sup>** to denote that those lexing rules are only for the given edition. I also think that the PR text clarifies the edition differences pretty well.

Then we'd be relying on separate text along the lines of "any instance of the above produces a tokenization error" to complete the explanation, and I think it's not clear that we're gaining anything by having the rules.

I think it helps with clarifying exactly what the rules are, similar to the same reasoning why other productions are listed. There are several other situations where the rules are listed, and then the text explains that it isn't allowed, like the reserved keywords.

As written the rules are also missing the case where ' appears as a lifetime or label rather than as part of a literal. I'm not sure whether they're missing other cases.

I think we can work on writing the correct rules, since it appears the implementation diverged from the RFC.

I understand it can be difficult to write or interpret the grammar since it is not very formal, particularly in the face of precedence. Unfortunately that is unlikely to improve in the foreseeable future. I think for now there is some value in trying to do the best with what we have.

I think something like the following should work:

RESERVED_TOKEN_DOUBLE_QUOTE : IDENTIFIER_OR_KEYWORD Except r or br or b "
RESERVED_TOKEN_SINGLE_QUOTE : IDENTIFIER_OR_KEYWORD Except b '
RESERVED_TOKEN_POUND : IDENTIFIER_OR_KEYWORD Except r or br #

(And I now see that _ isn't treated as a keyword in the reference, so we'd need to tweak the places where they say IDENTIFIER_OR_KEYWORD.)

I'm not sure I follow. IDENTIFIER_OR_KEYWORD includes _. XID_Start is defined as including _.

@mattheww
Copy link
Contributor Author

mattheww commented Jan 5, 2022

I've added those rules.

On lone _:

XID_Start doesn't include _, and the second part of IDENTITIER_OR_KEYWORD has + rather than *.

The profile in identifiers.md defines "Start" to include _, but doesn't attempt to redefine XID_Start.

(Also, I think it's clear that the reference isn't trying to make _ be an identifier, because if it was that would make the grammar say you could use it as the name of a function or struct.)

@ehuss
Copy link
Contributor

ehuss commented Jan 6, 2022

Oh, yea, I got confused mixing up the Start profile. I'm trying to remember why I added that profile stuff, as the UAX #31 definition of <Identifier> := <Start> <Continue>* (<Medial> <Continue>+)* would allow a bare _. Oh well.

Thanks for updating it! I pushed a small fix to have the 2021 edition.

@ehuss ehuss merged commit 6bc87e2 into rust-lang:master Jan 6, 2022
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 19, 2022
Update books

## nomicon

1 commits in c05c452b36358821bf4122f9c418674edd1d713d..66d097d3d80e8f88c288c6879c7c2b909ecf8ad4
2021-12-13 15:23:48 +0900 to 2022-01-05 05:45:21 +0900
- Fix typo / type error in FFI code example (rust-lang/nomicon#327)

## reference

8 commits in f8ba2f12df60ee19b96de24ae5b73af3de8a446b..4dee6eb63d728ffb9e7a2ed443e9ada9275c69d2
2022-01-03 11:02:08 -0800 to 2022-01-18 09:26:33 -0800
- (minor) Remove Expression Path sub-types splits in Pattern specs (rust-lang/reference#1138)
- Document destructuring assignment (rust-lang/reference#1116)
- Document the 2021 edition changes to macros-by-example `pat` metavariables (rust-lang/reference#1135)
- Improve the documentation of macros-by-example metavariable names (rust-lang/reference#1130)
- trait-bounds.md: add pronoun 'that' (rust-lang/reference#1131)
- Say that macros-by-example `ident` metavariables can match raw identifiers (rust-lang/reference#1133)
- State in the UAX31 profile description that a lone `_` is not an identifier (rust-lang/reference#1129)
- Document syntax reserved in Rust 2021 (rust-lang/reference#1128)

## book

17 commits in d3740fb7aad0ea4a80ae20f64dee3a8cfc0c5c3c..f17df27fc14696912c48b8b7a7a8fa49e648088d
2022-01-03 21:46:04 -0500 to 2022-01-18 17:46:28 -0500
- Add a notice to the top of all nostarch snapshots
- Fix quotes
- Grammar (minor): 'or' → 'and' for enum variants
- Propagate edits of chapter 8 to src
- Replies to nostarch edits
- more edits
- ch8 from nostarch
- Fix grammar and line wrapping
- Merge remote-tracking branch 'origin/pr/2880'
- Remove wikipedia link
- Merge remote-tracking branch 'origin/pr/2927'
- Snapshot of ch14 for nostarch
- Backport fixes to chapter 14 noticed while doing nostarch snapshot
- Fix usage of find piped into xargs
- Adjust some more line numbers of Cargo.toml includes
- Merge branch '2909'
- Merge remote-tracking branch 'parkerziegler/fix/ch14-add-one-naming'

## rustc-dev-guide

7 commits in 8754644..78dd6a4
2021-12-28 22:17:49 -0600 to 2022-01-18 14:44:26 -0300
- Reorganize and expand the testing chapters. (rust-lang/rustc-dev-guide#1281)
- Add inline assembly internals (rust-lang/rustc-dev-guide#1266)
- Spelling: Rename `rust` to `Rust` (rust-lang/rustc-dev-guide#1288)
- Clean up section about FCPs (rust-lang/rustc-dev-guide#1287)
- Address more review comments in rust-lang/rustc-dev-guide#1286.
- Address review comments in rust-lang/rustc-dev-guide#1286.
- Streamline "Getting Started" some more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2021: Update for reserved syntax (RFC 3101)
2 participants