Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor/feat: refactor identifier parsing a bit #109203

Merged
merged 3 commits into from
Mar 23, 2023

Conversation

Ezrashaw
Copy link
Contributor

+ error recovery for expected_ident_found

Prior art: #108854

@rustbot
Copy link
Collaborator

rustbot commented Mar 16, 2023

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 16, 2023
@compiler-errors
Copy link
Member

I don't have much time to study these parser code changes and validate they're correct/don't introduce other regressions.

maybe nils can take a look, or can re-roll
r? @Nilstrieb

@rustbot rustbot assigned Noratrieb and unassigned compiler-errors Mar 16, 2023
@Ezrashaw Ezrashaw force-pushed the refactor-ident-parsing branch from dd630e2 to 6b65663 Compare March 17, 2023 06:28
@Noratrieb
Copy link
Member

Can you separate the refactorings from the additional recovery? Either in a separate commit or a separate PR.

@Ezrashaw
Copy link
Contributor Author

@Nilstrieb Will do soonish. I was going to, but forgot and couldn't be bothered lol 🙄.

@Ezrashaw Ezrashaw force-pushed the refactor-ident-parsing branch from 6b65663 to 9eebc5e Compare March 17, 2023 09:27
@Ezrashaw
Copy link
Contributor Author

@Nilstrieb Split into three commits; improving spans for HelpIdentifierStartsWithNumber was lumped in there

compiler/rustc_span/src/lib.rs Outdated Show resolved Hide resolved
compiler/rustc_parse/src/parser/diagnostics.rs Outdated Show resolved Hide resolved
@@ -395,7 +391,7 @@ impl<'a> Parser<'a> {
} else {
PatKind::Lit(const_expr)
}
} else if self.can_be_ident_pat() {
} else if self.can_be_ident_pat() || self.is_lit_bad_ident().is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code adds (not really adds, it was already here with the early return previously which was also a mistake but now is the second best moment to fix it) a parser regression.

macro_rules! pat {
    ($p:pat) => {};
}

fn main() {
    pat!(3meow);
}

This code should compile (as literals are valid patterns) but doesn't. For reference, replacing :pat with :expr makes it compile. It also compiles on stable, where the early return above hasn't landed yet.

This can be fixed by adding a self.may_revover() && before this check. I still don't exactly like it, but it should fix the regression. I would prefer it if this wasn't an eager recovery but instead only started to influence behavior once there truly was an error, but I can accept it if you add the may_recover() and don't want to refactor it further.

For the future, always remember to think about how such changes can influence parser behavior and make sure to gate it behind a may_recover() if there the parser isn't in an error state yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I was the author of that suggestion. Bear in mind that I don't have years on experience working on rustc but I don't think this is a regression.

If you compile the following code:

macro_rules! pat {
    ($p:pat) => {
        let $p = 5;
    };
}

fn main() {
    pat!(3meow);
}

(the same code but using the $p metavariable)

Then it emits an error. This is the case since 1.0.0, the only thing this suggestion does is pick up on always invalid code and provide a better error message.

Secondly, AFAIK Parser::may_recover isn't "correct" here. Technically speaking, it is only for eager token recovery which neither the suggestion PR nor this PR introduce that. (eager recovery meaning consuming multiple tokens that might be valid?)

if there the parser isn't in an error state yet.

We are in an error state though, a numeric literal with an invalid suffix is always invalid.

Maybe I'm completely wrong (*cough* like #107813 *cough*) though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondly, AFAIK Parser::may_recover isn't "correct" here. Technically speaking, it is only for eager token recovery which neither the suggestion PR nor this PR introduce that.

You are absolutely right. Redirecting this to am early error like this is always wrong.

a numeric literal with an invalid suffix is always invalid.

This is not quite correct. A literal with an invalid syntax is semantically invalid. This means that it's not allowed in post expansion Rust code, as shown in your example which correctly errors. But an invalid suffix is syntactically valid, so we can't error out because macros might delete it like in my example.
I am not really sure about the best way to get the nice diagnostic without introducing regressions. Maybe finding out where the example normally errors and then trying to add something there?

But don't worry about making mistakes here, introducing parser regressions like this happens to many others as well, it's hard to spot unless you're already aware of the potential problem^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finding out where the example normally errors and then trying to add something there

Hmm, that'd be difficult because "invalid int lit suffix" errors are emitted while parsing expressions (which obviously can be in pattern position as well) and this diagnostic is only applicable to patterns. Maybe we could just put in self.may_recover and a fixme?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean literals instead of expressions? That does sound a little tricky, adding a parameter would fix it but is also probably a little much.
So I guess the alternatives here are also having the diagnostic in other places where literals are allowed or not having it at all. I don't want to just put broken code behind a FIXME as these usually don't get fixed in quite a while.

Maybe you have some other ideas or you could try out adding the parameter if possible and see how bad it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so what do we do from here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it up to you whether you would prefer removing the diagnostic or whether you'd be fine with changing it so that it also shows the note inside expressions. I don't think it would hurt, so I'd be fine with either.

In the meantime, you could also split out the first and last commit into a separate PR if you like, I would approve that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure may_recover wouldn't work here? It'll be a bit overreaching (all use in macros won't have the improved diagnostic) but it'll fix the regression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but I'm not entirely sure, I would need to check.
But actually, I just changed my mind about this PR. While this doesn't fix the regression, it doesn't introduce a new one either. We should merge this and I'll open an issue about the regression (which you can claim if you want, but of course don't have to).
We can continue this discussion on the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And may_recover is probably better than nothing, so let's do that anyways. It's not like this is a very critical issue anyways.

compiler/rustc_parse/src/parser/diagnostics.rs Outdated Show resolved Hide resolved
@Noratrieb Noratrieb added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 18, 2023
@Ezrashaw Ezrashaw force-pushed the refactor-ident-parsing branch 2 times, most recently from b1b182e to d7efda3 Compare March 19, 2023 10:37
@Ezrashaw
Copy link
Contributor Author

@Nilstrieb In any case, I've pushed my proposed changes with Parser::may_recover.

@rust-log-analyzer

This comment has been minimized.

@Ezrashaw Ezrashaw force-pushed the refactor-ident-parsing branch from d7efda3 to f08d17a Compare March 20, 2023 03:32
@Ezrashaw
Copy link
Contributor Author

@Nilstrieb whoops, all fixed. Would you like a PR renaming may_recover to may_recover_lookahead as well?

@Noratrieb
Copy link
Member

Noratrieb commented Mar 20, 2023

I've played around with it and actually, the input tokens to attribute/derive proc macros do have to be semantically valid, so doing the eager check there is okay. So actually having the self.may_recover() check here does catch all cases. It would be nice if you could add a comment to the may_recover call roughly like that

Don't eagerly error on semantically invalid tokens when matching declarative macros, as the input to those doesn't have to be semantically valid.
For attribute/derive proc macros this is not the case, so doing the recovery for them is fine.

Would you like a PR renaming may_recover to may_recover_lookahead as well?

Yes, that would be useful (although the exact wording might be subject to some bikeshedding)

After you've added that comment and removed the comment on "can we recover here" (since you do recover there) this should be good to go.

@Ezrashaw
Copy link
Contributor Author

Ezrashaw commented Mar 20, 2023

@Nilstrieb
After you've added that comment and removed the comment on "can we recover here" (since you do recover there) this should be good to go.

Sorry, I meant can we recursively recover there? I'm not entirely sure that we shouldn't but I'm not sure.

EDIT: On second thought, probably not a good idea to recursively recover there, otherwise everything should be good to go?

Also, with the may_recover -> may_recover_lookahead, should I just create a PR and bikeshed it on the PR?

@Ezrashaw Ezrashaw force-pushed the refactor-ident-parsing branch from f08d17a to 05b5046 Compare March 20, 2023 07:54
Copy link
Member

@Noratrieb Noratrieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bors r+

@Noratrieb
Copy link
Member

yes, just create a PR for that

@Noratrieb
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Mar 20, 2023

📌 Commit 05b5046 has been approved by Nilstrieb

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 20, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 21, 2023
…=Nilstrieb

refactor/feat: refactor identifier parsing a bit

\+ error recovery for `expected_ident_found`

Prior art: rust-lang#108854
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 23, 2023
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#108954 (rustdoc: handle generics better when matching notable traits)
 - rust-lang#109203 (refactor/feat: refactor identifier parsing a bit)
 - rust-lang#109213 (Eagerly intern and check CrateNum/StableCrateId collisions)
 - rust-lang#109358 (rustc: Remove unused `Session` argument from some attribute functions)
 - rust-lang#109359 (Update stdarch)
 - rust-lang#109378 (Remove Ty::is_region_ptr)
 - rust-lang#109423 (Use region-erased self type during IAT selection)
 - rust-lang#109447 (new solver cleanup + implement coherence)
 - rust-lang#109501 (make link clickable)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 34fa6da into rust-lang:master Mar 23, 2023
@rustbot rustbot added this to the 1.70.0 milestone Mar 23, 2023
@Ezrashaw Ezrashaw deleted the refactor-ident-parsing branch March 26, 2023 03:33
Comment on lines +426 to +430
suffix,
}) = self.token.kind
&& rustc_ast::MetaItemLit::from_token(&self.token).is_none()
{
Some((symbol.as_str().len(), suffix.unwrap()))
Copy link
Member

@compiler-errors compiler-errors Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #110014

-             suffix,
+             suffix: Some(suffix),
         }) = self.token.kind
             && rustc_ast::MetaItemLit::from_token(&self.token).is_none()
         {
-            Some((symbol.as_str().len(), suffix.unwrap()))
+            Some((symbol.as_str().len(), suffix))

Copy link
Contributor Author

@Ezrashaw Ezrashaw Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to PR this? Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if you could put up the fix, yes :)

Ezrashaw added a commit to Ezrashaw/rust that referenced this pull request Apr 6, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Apr 6, 2023
…on, r=compiler-errors

fix: fix regression in rust-lang#109203

Fixes rust-lang#110014

r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 7, 2023
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#109806 (Workaround rust-lang#109797 on windows-gnu)
 - rust-lang#109957 (diagnostics: account for self type when looking for source of unsolved type variable)
 - rust-lang#109960 (Fix buffer overrun in bootstrap and (test-only) symlink_junction)
 - rust-lang#110013 (Label `non_exhaustive` attribute on privacy errors from non-local items)
 - rust-lang#110016 (Run collapsed GUI test in mobile mode as well)
 - rust-lang#110022 (fix: fix regression in rust-lang#109203)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants