Transition rustc Parser to proc_macro token model #63689

matklad · 2019-08-18T16:48:13Z

Currently, there are two different approaches for dealing with composite tokens like >> in rustc.

Keep tokens in composed form, and split into pieces, > and >, when necessary.
Keep tokens decomposed, with jointness information, and join tokens when necessary.

At the moment, the first approach is used by the parser, and the second approach is used by the proc_macro API. It would be good to move the parser to the decomposed approach as well, as it is somewhat more natural, more future-compatible (one can introduce new tokens) and having two of a thing is bad in itself!

Here are some relevant bits of the code that handle composed model:

Composed tokens as produced by rustc_lexer
Composed tokens preserved by the token cooking
Here's the bit when we produce a TokenTree, consumed by the parser. Note that, although we are tracking jointness here, the tokens are composed.
Here's the bit of the parser which decomposes tokens on the fly.

Here are the bits relevant to decomposed model:

Gluing tokens in TokenStreamBuilder
Token::glue

Note that the tt matcher in macro_rules eats one composed token, and this is affects language specification.
That is, when we transition to decomposed model, we'll need to fix this code to eat one composed token to maintain backwards compatibility.

The text was updated successfully, but these errors were encountered:

matklad · 2019-08-18T16:50:42Z

cc @petrochenkov

@petrochenkov

…henkov Move token gluing to token stream parsing work towards rust-lang#63689, this moves token gluing from the lexer to the token tree layer. This is only a minimal step, but I like the negative diff here. r? @petrochenkov

@petrochenkov

Move token gluing to token stream parsing work towards #63689, this moves token gluing from the lexer to the token tree layer. This is only a minimal step, but I like the negative diff here. r? @petrochenkov

matklad · 2019-08-31T16:54:25Z

made some initial stabs in matklad@0d46730.

The idea is to remove cases from Token::glue one by one, until no tokens are glued together, except for tt matcher.

Faced a couple of problems:

parse_assert accepts an &[TokenTree], which throughs away jointness info, so assert!(1 != 1) does not parse
NtTT should be changed from holding a TokenTree to holding an TokenStream, to account for the fact that $tt:tt eats <<, which are two tokens in the new model

EDIT: more stabs at https://github.com/matklad/rust/tree/decomposed-neq. fixed all parser problems, not it looks like we are loosing jointness info somewhere..

matklad · 2019-09-16T20:40:35Z

Fond next obstacle: in macros by example, quoted::TokenTree erases jointness information, so

macro_rules! m { () => (==) }

produces = =. Note that jointness seems to be correctly preserved by macro invocations. I guess we should change quoted::Delimited to store token trees with jointness, to better mirror the TokenStream.

matklad · 2019-09-16T20:41:55Z

@petrochenkov is ^ a good plan? Or are there any bigger plans for refactoring quote, which we should do first?

petrochenkov · 2019-09-16T20:54:39Z

I never thought about this case.
Preserving jointness would probably be a good start unconditionally.

IIRC, the stuff in syntax::ext::tt::quoted should behave exactly like usual token streams, it's just re-hashed slightly for more convenient work with macro_rules DSL.
(Maybe it doesn't even add too much convenience and can be removed? Who knows.)
Anyway, if it behaves differently than regular token streams, it's something that's better fixed.

matklad · 2019-09-16T20:58:01Z

(Maybe it doesn't even add too much convenience and can be removed? Who knows.)

Hm, so that we represent $var:ty as literally $var:ty, and just match them on the fly while transcribing? I guess I'll try this approach for mbe in rust-analyzer. Currently, we blindly copy rustc approach with duplicated TokenStream.

For rustc, I feel like it's better to stick with the current model for the time being

matklad · 2019-09-16T23:52:30Z

Note there's another place, besides tt matcher, where we leak jointness: separators in repetitions:

macro_rules! m {
    ($()>>=*) => ()
}

m!(>>=  >>=);

playground

matklad · 2019-09-17T13:08:40Z

Maybe it doesn't even add too much convenience and can be removed?

This worked out quite nicely for rust-analyzer: rust-lang/rust-analyzer#1858. I think we should maybe do this for rustc as well, but probably after moving to disjoint model.

Part of rust-lang#63689.

After the recent refactorings, we can actually completely hide this type. It should help with rust-lang#63689.

jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 18, 2019

This was referenced Aug 19, 2019

Normalize newlines when loading files #62948

Merged

Move token gluing to token stream parsing #63709

Merged

This was referenced Aug 31, 2019

use TokenStream rather than &[TokenTree] for built-in macros #64041

Merged

Lexical specification? rust-lang/wg-grammar#3

Open

matklad mentioned this issue Sep 27, 2019

Remove TreeAndJoint in favor of joint field on the Token #64782

Closed

chrissimpkins mentioned this issue May 20, 2020

"The parser" rust-lang/rustc-dev-guide#13

Open

8 tasks

matklad mentioned this issue Aug 14, 2020

WIP: Move jointness info from TokenStream to Token #75528

Closed

matklad added a commit to matklad/rust that referenced this issue Aug 14, 2020

Move jointness info from TokenStream to Token

922ec17

Part of rust-lang#63689.

matklad added the WG-parselib label Aug 29, 2020

matklad added a commit to matklad/rust that referenced this issue Aug 31, 2020

Make StringReader private

30ce15f

After the recent refactorings, we can actually completely hide this type. It should help with rust-lang#63689.

jackh726 removed the WG-parselib label Jan 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transition rustc Parser to proc_macro token model #63689

Transition rustc Parser to proc_macro token model #63689

matklad commented Aug 18, 2019

matklad commented Aug 18, 2019

matklad commented Aug 31, 2019 •

edited

Loading

matklad commented Sep 16, 2019

matklad commented Sep 16, 2019

petrochenkov commented Sep 16, 2019

matklad commented Sep 16, 2019

matklad commented Sep 16, 2019

matklad commented Sep 17, 2019

Transition rustc Parser to proc_macro token model #63689

Transition rustc Parser to proc_macro token model #63689

Comments

matklad commented Aug 18, 2019

matklad commented Aug 18, 2019

matklad commented Aug 31, 2019 • edited Loading

matklad commented Sep 16, 2019

matklad commented Sep 16, 2019

petrochenkov commented Sep 16, 2019

matklad commented Sep 16, 2019

matklad commented Sep 16, 2019

matklad commented Sep 17, 2019

matklad commented Aug 31, 2019 •

edited

Loading