Store tokens alongside more AST expressions #70091

Aaron1011 · 2020-03-18T01:52:52Z

See #43081 (comment)
This PR calls collect_tokens during the parsing of more AST nodes, and stores the captured tokens in the parsed AST structs. These tokens are then used in nt_to_tokenstream to avoid needing to stringify AST nodes.

Since this implementation completely ignores attributes, it will probably explode when given any kind of complicated input. This PR is intended mainly to estimate the performance impact of collecting and storing more tokens - a correct implementation will most likely have to do more work than this.

I've only implemented token collecting for a few types of expressions - the current expression parsing implementation makes it difficult to get the proper tokens for every expression.

Nevertheless, this is able to bootstrap libstd, and generate better error messages for a simple proc-macro example: (https://github.com/Aaron1011/for-await-test)

#![feature(stmt_expr_attributes, proc_macro_hygiene)]
use futures::stream::Stream;
use futures_async_stream::for_await;

async fn collect(stream: impl Stream<Item = i32>) -> Vec<i32> {
    let mut vec = Vec::new();
    #[for_await]
    for value in stream.foo() {
        vec.push(value);
    }
    vec
}

fn main() {
    println!("Hello, world!");
}

on the latest nightly:

error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:7:5
  |
7 |     #[for_await]
  |     ^^^^^^^^^^^^ method not found in `impl Stream<Item = i32>`

error: aborting due to previous error

with this PR:

error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope
 --> src/main.rs:8:25
  |
8 |     for value in stream.foo() {
  |                         ^^^ method not found in `impl Stream<Item = i32>`

This fails to bootstrap, since we completely ignore `#[cfg]` and other attributes/macros

Aaron1011 · 2020-03-18T01:53:30Z

r? @petrochenkov

rust-highfive · 2020-03-18T02:06:17Z

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

2020-03-18T01:53:34.0708748Z ========================== Starting Command Output ===========================
2020-03-18T01:53:34.0711629Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/8234da10-2c46-43e5-9224-84df59cb321e.sh
2020-03-18T01:53:34.0711951Z 
2020-03-18T01:53:34.0716774Z ##[section]Finishing: Disable git automatic line ending conversion
2020-03-18T01:53:34.0737531Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T01:53:34.0741868Z Task         : Get sources
2020-03-18T01:53:34.0742201Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-18T01:53:34.0742515Z Version      : 1.0.0
2020-03-18T01:53:34.0743792Z Author       : Microsoft
---
2020-03-18T01:53:35.2090725Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-03-18T01:53:35.2099111Z ##[command]git config gc.auto 0
2020-03-18T01:53:35.2114085Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-03-18T01:53:35.2117649Z ##[command]git config --get-all http.proxy
2020-03-18T01:53:35.2133976Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/70091/merge:refs/remotes/pull/70091/merge
---
2020-03-18T02:00:27.2129929Z     Checking rustc_parse v0.0.0 (/checkout/src/librustc_parse)
2020-03-18T02:00:28.7219048Z error: unused variable: `tokens`
2020-03-18T02:00:28.7220458Z     --> src/librustc_parse/parser/expr.rs:2134:9
2020-03-18T02:00:28.7221567Z      |
2020-03-18T02:00:28.7222538Z 2134 |         tokens: Option<TokenStream>,
2020-03-18T02:00:28.7224281Z      |         ^^^^^^ help: consider prefixing with an underscore: `_tokens`
2020-03-18T02:00:28.7226264Z      = note: `-D unused-variables` implied by `-D warnings`
2020-03-18T02:00:28.7226793Z 
2020-03-18T02:00:29.5294120Z error: aborting due to previous error
2020-03-18T02:00:29.5294879Z 
2020-03-18T02:00:29.5294879Z 
2020-03-18T02:00:29.5374184Z error: could not compile `rustc_parse`.
2020-03-18T02:00:29.5378485Z 
2020-03-18T02:00:29.5379181Z To learn more, run the command again with --verbose.
2020-03-18T02:00:29.5396461Z warning: build failed, waiting for other jobs to finish...
2020-03-18T02:00:30.3367502Z error: build failed
2020-03-18T02:00:30.3370440Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "check" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "2" "--release" "--color" "always" "--features" " llvm" "--manifest-path" "/checkout/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
2020-03-18T02:00:30.3372442Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap check
2020-03-18T02:00:30.3372972Z Build completed unsuccessfully in 0:04:14
2020-03-18T02:00:30.3373389Z == clock drift check ==
2020-03-18T02:00:30.3373779Z   local time: Wed Mar 18 02:00:29 UTC 2020
2020-03-18T02:00:30.3373779Z   local time: Wed Mar 18 02:00:29 UTC 2020
2020-03-18T02:00:30.3374253Z   network time: Wed, 18 Mar 2020 02:00:30 GMT
2020-03-18T02:00:30.3374686Z == end clock drift check ==
2020-03-18T02:00:30.9943242Z 
2020-03-18T02:00:31.0023438Z ##[error]Bash exited with code '1'.
2020-03-18T02:00:31.0049128Z ##[section]Finishing: Run build
2020-03-18T02:00:31.0107013Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T02:00:31.0112430Z Task         : Get sources
2020-03-18T02:00:31.0112855Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-03-18T02:00:31.0113231Z Version      : 1.0.0
2020-03-18T02:00:31.0113485Z Author       : Microsoft
2020-03-18T02:00:31.0113485Z Author       : Microsoft
2020-03-18T02:00:31.0113909Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
2020-03-18T02:00:31.0114401Z ==============================================================================
2020-03-18T02:00:31.3848888Z Cleaning any cached credential from repository: rust-lang/rust (GitHub)
2020-03-18T02:00:31.3914358Z ##[section]Finishing: Checkout rust-lang/rust@refs/pull/70091/merge to s
2020-03-18T02:00:31.4042609Z Cleaning up task key
2020-03-18T02:00:31.4044468Z Start cleaning up orphan processes.
2020-03-18T02:00:31.4302879Z Terminate orphan process: pid (3625) (python)
2020-03-18T02:00:31.4533403Z ##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

petrochenkov · 2020-03-19T09:15:36Z

@bors try

bors · 2020-03-19T09:15:48Z

⌛ Trying commit e56b697 with merge e58cfd9...

[WIP] Store tokens alongside more AST expressions See #43081 (comment) This PR calls `collect_tokens` during the parsing of more AST nodes, and stores the captured tokens in the parsed AST structs. These tokens are then used in `nt_to_tokenstream` to avoid needing to stringify AST nodes. Since this implementation completely ignores attributes, it will probably explode when given any kind of complicated input. This PR is intended mainly to estimate the performance impact of collecting and storing more tokens - a correct implementation will most likely have to do more work than this. I've only implemented token collecting for a few types of expressions - the current expression parsing implementation makes it difficult to get the proper tokens for every expression. Nevertheless, this is able to bootstrap libstd, and generate better error messages for a simple proc-macro example: (https://github.com/Aaron1011/for-await-test) ```rust #![feature(stmt_expr_attributes, proc_macro_hygiene)] use futures::stream::Stream; use futures_async_stream::for_await; async fn collect(stream: impl Stream<Item = i32>) -> Vec<i32> { let mut vec = Vec::new(); #[for_await] for value in stream.foo() { vec.push(value); } vec } fn main() { println!("Hello, world!"); } ``` on the latest nightly: ``` error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope --> src/main.rs:7:5 | 7 | #[for_await] | ^^^^^^^^^^^^ method not found in `impl Stream<Item = i32>` error: aborting due to previous error ``` with this PR: ``` error[E0599]: no method named `foo` found for type parameter `impl Stream<Item = i32>` in the current scope --> src/main.rs:8:25 | 8 | for value in stream.foo() { | ^^^ method not found in `impl Stream<Item = i32>` ```

bors · 2020-03-19T11:53:43Z

☀️ Try build successful - checks-azure
Build commit: e58cfd9 (e58cfd95ea92ee5f46c4ff923ebee0b7ef08f793)

petrochenkov · 2020-03-19T14:12:23Z

@rust-timer build e58cfd9

rust-timer · 2020-03-19T14:12:25Z

Queued e58cfd9 with parent 6724d58, future comparison URL.

rust-timer · 2020-03-19T16:29:26Z

Finished benchmarking try commit e58cfd9, comparison URL.

Aaron1011 · 2020-03-19T20:03:21Z

Those results are pretty disappointing. However, the worst regressions seem to be on 'weird' crates - e.g. deep-vector is just an invocation of vec! with 8k lines of input. I'll see if there's anything that we can do speed to collect_tokens for these kinds of cases.

Aaron1011 · 2020-03-19T20:44:43Z

@petrochenkov: What do you think about modifying TokenStream to store a Range alongside the Lrc<Vec<TreeAndJoint>>? Instead of building up a new Vec in collect_tokens, we could just clone the existing Lrc<Vec<TreeAndJoint>>, and store a Range indicating the slice of tokens that we actually captured. The tokens captured by collect_tokens should always have matching delimiters, so we shouldn't need to worry about capturing only 'part of' a TokenTree:Delimited.

petrochenkov · 2020-03-21T13:48:37Z

We can reduce (2) to almost zero by collecting tokens only for expressions that actually have attributes (which is a very small percent of expressions)

So, I looked at the uses of nt_to_tokenstream and besides passing tokens to attribute macros it's also used for passing nonterminals generated by macro_rules to proc macros.

So, we need to collect tokens in two cases:

parse_expr is called from librustc_expand\mbe\macro_parser.rs for parsing a nonterminal.
parsing the expression after expression attributes

In other cases the collection shouldn't be required.
Limiting token collection to these two cases may also simplify the parsing part.

petrochenkov · 2020-03-21T13:49:43Z

@Aaron1011
#70091 (comment) may be a good solution if we end up collecting all the tokens.
But I'd like to first try to not collect them at all unless necessary.

Aaron1011 · 2020-03-22T16:35:10Z

parsing the expression after expression attributes

@petrochenkov: How does this interact with custom inner attributes? We need to already be collecting tokens by the time we parse them.

eddyb · 2020-03-23T00:04:36Z

@Aaron1011 Btw, you slicing idea is one of the original reasons for TokenStream using reference-counting and whatnot, we just never got around to implementing it.
Not to mention proc macros have no easy way to access anything like that.

petrochenkov · 2020-03-23T21:29:33Z

TokenStream using reference-counting and whatnot, we just never got around to implementing it.

Some history of TokenStream representation - #57004 (comment).
Are we going on the second cycle? :D

petrochenkov · 2020-03-23T21:58:30Z

@Aaron1011

How does this interact with custom inner attributes? We need to already be collecting tokens by the time we parse them.

Good catch.
To cover inner attributes we need to collect tokens for blocks, array/tuple literals, and other constructions that can act as "containers" for inner attributes.

However, the main question is - what we are aiming for here, a practical improvement, or a proper holistic solution?
I'd say it's the former.

If we don't fix inner attributes it's still an improvement, inner macro attributes are rare and feature-gated.
Even if we collect tokens for expressions with inner attributes, they will still fail the probably_equal_for_proc_macro check (because it will compare token streams before and after the inner attribute macro is removed). Items with inner macro attributes currently go through pretty-printing despite the tokens being collected for items.
Even if we fix the above problem somehow, fragments with cfgs will still fail the probably_equal_for_proc_macro check.
Some other cases like unnecessary path disambiguators (type A = Vec::<u8>;) will fail the check as well because we do not keep them in AST precisely.

A proper solution is a big design task, which I'm certainly not ready to drive now.
Perhaps something like @matklad's redesign of AST and parser will address these issues long term somehow.

In the meantime, we can improve the situation for common cases in ways that don't regress compile times. This largely means ignoring inner macro attributes.

petrochenkov · 2020-03-23T22:06:08Z

(To clarify, I'm not against making improvements to treatment of inner attributes in a separate PR with a separate perf run, as one more incremental improvement.)

Dylan-DPC-zz · 2020-04-15T18:05:03Z

r? @petrochenkov

Dylan-DPC-zz · 2020-05-13T17:49:25Z

r? @oli-obk

oli-obk · 2020-05-14T06:50:05Z

This is waiting for author

r? @petrochenkov

petrochenkov · 2020-05-14T07:43:11Z

Could close due to inactivity as well, it's been ~1.5 months.

…r=petrochenkov Store tokens inside `ast::Expr` This is a smaller version of rust-lang#70091. We now store captured tokens inside `ast::Expr`, which allows us to avoid some reparsing in `nt_to_tokenstream`. To try to mitigate the performance impact, we only collect tokens when we've seen an outer attribute. This makes progress towards solving rust-lang#43081. There are still many things left to do: * Collect tokens for other AST items. * Come up with a way to handle inner attributes (we need to be collecting tokens by the time we encounter them) * Avoid re-parsing when a `#[cfg]` attr is used. However, this is enough to fix spans for a simple example, which I've included as a test case.

Aaron1011 added 5 commits March 17, 2020 16:16

Store tokens in more ast structs

025591f

Don't actually use the attached tokens for now

55982c2

This fails to bootstrap, since we completely ignore `#[cfg]` and other attributes/macros

Handle opening delimiter in collect_tokens

542e2f5

Re-enable usage of saved tokens

c99fd5c

Save tokens for a few kinds of expressions

af3a5a8

rust-highfive assigned estebank Mar 18, 2020

This comment has been minimized.

Sign in to view

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 18, 2020

rust-highfive assigned petrochenkov and unassigned estebank Mar 18, 2020

Aaron1011 added 3 commits March 17, 2020 22:07

Actually use parameter

54e2a65

Run fmt

40bc7a7

Fix test

e56b697

Centril self-assigned this Mar 18, 2020

petrochenkov added S-waiting-on-perf Status: Waiting on a perf run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 19, 2020

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 19, 2020

petrochenkov mentioned this pull request Mar 20, 2020

[experiment] Make ast::Expr larger #70200

Closed

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 21, 2020

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 22, 2020

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 23, 2020

Dylan-DPC-zz marked this pull request as draft April 10, 2020 13:54

Dylan-DPC-zz changed the title ~~[WIP] Store tokens alongside more AST expressions~~ Store tokens alongside more AST expressions Apr 10, 2020

Dylan-DPC-zz unassigned Centril Apr 15, 2020

rust-highfive assigned oli-obk and unassigned petrochenkov May 13, 2020

rust-highfive assigned petrochenkov and unassigned oli-obk May 14, 2020

petrochenkov closed this May 14, 2020

Aaron1011 mentioned this pull request May 17, 2020

Store tokens inside ast::Expr #72287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store tokens alongside more AST expressions #70091

Store tokens alongside more AST expressions #70091

Aaron1011 commented Mar 18, 2020

This comment has been minimized.

Aaron1011 commented Mar 18, 2020

rust-highfive commented Mar 18, 2020

petrochenkov commented Mar 19, 2020

bors commented Mar 19, 2020

bors commented Mar 19, 2020

petrochenkov commented Mar 19, 2020

rust-timer commented Mar 19, 2020

rust-timer commented Mar 19, 2020

Aaron1011 commented Mar 19, 2020

Aaron1011 commented Mar 19, 2020 •

edited

Loading

petrochenkov commented Mar 21, 2020

petrochenkov commented Mar 21, 2020

Aaron1011 commented Mar 22, 2020

eddyb commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

Dylan-DPC-zz commented Apr 15, 2020

Dylan-DPC-zz commented May 13, 2020

oli-obk commented May 14, 2020

petrochenkov commented May 14, 2020

Store tokens alongside more AST expressions #70091

Store tokens alongside more AST expressions #70091

Conversation

Aaron1011 commented Mar 18, 2020

This comment has been minimized.

Aaron1011 commented Mar 18, 2020

rust-highfive commented Mar 18, 2020

petrochenkov commented Mar 19, 2020

bors commented Mar 19, 2020

bors commented Mar 19, 2020

petrochenkov commented Mar 19, 2020

rust-timer commented Mar 19, 2020

rust-timer commented Mar 19, 2020

Aaron1011 commented Mar 19, 2020

Aaron1011 commented Mar 19, 2020 • edited Loading

petrochenkov commented Mar 21, 2020

petrochenkov commented Mar 21, 2020

Aaron1011 commented Mar 22, 2020

eddyb commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

petrochenkov commented Mar 23, 2020

Dylan-DPC-zz commented Apr 15, 2020

Dylan-DPC-zz commented May 13, 2020

oli-obk commented May 14, 2020

petrochenkov commented May 14, 2020

Aaron1011 commented Mar 19, 2020 •

edited

Loading