Parser recovery rework #972

sbillig · 2023-12-14T17:24:19Z

What was wrong?

How was it fixed?

To-Do

change "Inherited/Override" recovery method stuff
Clean up commit history

sbillig · 2024-03-28T05:25:53Z

@Y-Nak I guess I'm done with this. It's not perfect, but I think it's better.

Y-Nak

LGTM, overall. I think this PR generally improves error messages a lot. Thanks!

I think we still need Override or a similar way to prohibit the parser from using the parent scope's recovery tokens, so that error recovery doesn't change the semantics of the user's code.

e.g.,

impl Foo {
    struct S {}
    fn foo(x: i32) { 
    1 
    }
}

↓

Root@0..67
  ItemList@0..67
    Item@0..15
      Impl@0..15
        ...
    Item@15..26
      Struct@15..26
        ...
    Item@31..65
      Func@31..65
        ...

In this example, struct S and fn foo are parsed in the outer item list scope, which will probably cause a negative impact on the quality of the diagnostics from the later analysis phases.

crates/hir-analysis/src/ty/def_analysis.rs

crates/parser2/src/ast/item.rs

crates/parser2/src/parser/mod.rs

crates/parser2/src/parser/param.rs

Y-Nak · 2024-04-05T22:08:17Z

This is probably related to how to handle opening/closing brackets thing since these bracket kinds make a semantic scope that should break the inheritance of the recovery token generally, I guess.

sbillig · 2024-04-05T22:39:12Z

I think we still need Override or a similar way to prohibit the parser from using the parent scope's recovery tokens, so that error recovery doesn't change the semantics of the user's code.

Preventing the parser from using the parent scope's recovery tokens would change the semantics of this code:

impl Foo {

struct S {}
fn foo(x: i32) {
1
}

(ie the user didn't close the impl block, or is in the process of typing it).

I think it's impossible to avoid misinterpreting the semantics of the user's code in all cases. I think the "best" solution is to use the indentation level as a hint about the user's intention. In your example, the indentation implies that the struct is being defined inside the impl block, and could thus be recognized as disallowed. That said, I don't currently have the energy to implement such a thing.

Lacking indentation hints, when the parser finds an unexpected token which is an explicit recovery token in a parent scope, I think the most reasonable thing to do is to stop trying to parse the current scope and defer to the parent. If we allow the current scope to continue in the face of unexpected tokens, a single unclosed block could conceivably swallow the entire remainder of the file (unless we do something like treat some available closing brace as the end of the block, which can also introduce changes to semantics).

I chose to prioritize the case where the user is currently editing the file, or made a simple error (like a missing closing brace), over more complicated errors (like defining something where it's not allowed). The current implementation should handle the former reasonably well, but will change the semantics in the latter. Maybe a better error message could explain to the user that in your example the parser has decided that the struct keyword implies that the impl block is closed.

Y-Nak · 2024-04-06T10:20:21Z

I'm sorry for the explanation that was too vague. I do understand that it's essentially a heuristic issue, regardless of how much we delve into it. What has particularly caught my attention is the aggressiveness of an error recovery. Of course, there's a degree of subjectivity in determining which recovery are considered "too" aggressive, though.

The following example concerns me especially:

trait Trait {
    type Type;
    fn foo(x: i32);
}

I suppose Rust users would often make this mistake in Fe. However, if the parser interprets type Type and `fn foo' as being defined at the parent module scope, the error message could be quite misleading.
This problem could be resolved by simplifying the BNF that the parser refers to as a specification, as you previously mentioned. Simplifying the BNF would likely allow each parsing scope to define more syntax kinds for error recovery. Semantic errors could be addressed by the later analysis phase.

As another problem, regarding unclosed scopes, I'm starting to feel that special handling will be necessary for this case. Essentially, I think it'd be better to (conceptually) give up parsing when the code is in unclosed scopes. This is because guessing the user's original intent becomes particularly challenging in this case, and aggressive error recovery could lead to misleading error messages.
This problem might be related to the error refinement idea because it brings another complexity if we try to let the parser determine whether a scope is properly closed while parsing the internal syntax elements of the scope.
This is related to the ADT recursion problem as well. That is, the more we try to componentize the compiler, the more difficult it is to emit better error messages when an error is related to another error emitted from another compiler component.
So, before outputting accumulated error messages, we might need a phase that refines the accumulated errors. For example, if the spans of two errors overlap, only one of the errors is selected to be output(at least, the output of one error should be suppressed if it is encompassed by another).
In the parser case, each parsing scope could just accumulate errors without caring about whether the scope is closing or not; then when the parser notices that the scope is not closed, the child node can be wrapped up by the error node. After that, the error refinement phase would suppress the errors that were accumulated inside the unclosed scope.

So, basically, I feel that we probably need to

make BNF simpler
wrap up an entire unclosed scope by an error syntax node
add an error refinement phase

But I'm unsure if this is a proper direction. I'd really appreciate your opinion.
NOTE: I do not think these problems should be resolved in this PR either way. This PR has already resolved many other problems and significantly improved the quality of the parser implementation. I just want to deepen our understanding of these issues for the future.

sbillig · 2024-04-07T20:38:19Z

The following example concerns me especially:
trait Trait {
    type Type;
    fn foo(x: i32);
}
I suppose Rust users would often make this mistake in Fe. However, if the parser interprets type Type and `fn foo' as being defined at the parent module scope, the error message could be quite misleading. This problem could be resolved by simplifying the BNF that the parser refers to as a specification, as you previously mentioned. Simplifying the BNF would likely allow each parsing scope to define more syntax kinds for error recovery. Semantic errors could be addressed by the later analysis phase.

One easy option is to explicitly add parser support for cases that we feel might be common errors, and/or for syntax that we intend to support someday. (Like the fn-def-inside-struct case that you implemented.) In this situation, I think it would be maybe be ideal to recover to the parent scope on the type keyword if it's not indented.

In the parser case, each parsing scope could just accumulate errors without caring about whether the scope is closing or not; then when the parser notices that the scope is not closed, the child node can be wrapped up by the error node. After that, the error refinement phase would suppress the errors that were accumulated inside the unclosed scope.

How does the parser notice that the scope is not closed? The only way it can be certain is if it reaches the end of the file and failed to find a matching closing bracket, which of course means that the scope has consumed the rest of the file. This seems like a pretty poor outcome to me.

The alternative of course is to guess the user's intent, and I think we can do pretty well by considering indentation and looking for keywords that definitely aren't allowed (or likely to be mistakenly used) in a given scope. We could make this nicer by explicitly telling the user when the parser has considered an unclosed scope to be closed. (I'm just repeating myself, because I can't think of any better ideas :)

I think it's actually ok to allow for the possibility of misleading errors (like every compiler I've ever used), in favor of nice behavior most of the time. Perhaps the parser could behave differently in LSP mode vs CLI mode, with the latter being more strict?

Perhaps we could mark nodes that are in a questionably-parsed region (not sure how to define this) as "tainted", and suppress all errors that involve them? (Eg in the previous crappy parser/analyzer, we suppressed "item not found" errors relating to files that didn't parse successfully, assuming that perhaps the missing item wasn't parsed.)

Y-Nak

LGTM!

Y-Nak changed the base branch from master to fe-v2 December 14, 2023 17:25

Y-Nak mentioned this pull request Jan 5, 2024

v2: modest parser improvements #934

Merged

4 tasks

sbillig force-pushed the parser-recovery-rework branch from 5101f80 to 1aa215d Compare January 29, 2024 02:59

sbillig force-pushed the parser-recovery-rework branch 2 times, most recently from 833ddc3 to 42202eb Compare March 1, 2024 21:12

sbillig force-pushed the parser-recovery-rework branch from d110e74 to a54889c Compare March 13, 2024 23:10

sbillig added 2 commits March 13, 2024 16:16

Parser recovery rework

14a990a

Better parser error message headers (ExpectedKind enum)

ff27aae

sbillig force-pushed the parser-recovery-rework branch from a54889c to ff27aae Compare March 13, 2024 23:19

Remove RecoveryMethod in favor of an optional list of match tokens

b130adb

sbillig marked this pull request as ready for review March 28, 2024 05:21

sbillig requested a review from Y-Nak March 28, 2024 05:21

Y-Nak reviewed Apr 5, 2024

View reviewed changes

sbillig added 2 commits April 5, 2024 15:54

Fix build warning (unused field in def_analysis::DefKind::Impl)

d3917cc

Remove dead code

e8ff942

sbillig force-pushed the parser-recovery-rework branch from e3cb533 to 9286abf Compare April 5, 2024 22:57

Fix review nits

273ce09

sbillig force-pushed the parser-recovery-rework branch from 9286abf to 273ce09 Compare April 6, 2024 02:27

Y-Nak approved these changes Apr 12, 2024

View reviewed changes

sbillig merged commit 57335f1 into ethereum:fe-v2 Apr 12, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser recovery rework #972

Parser recovery rework #972

sbillig commented Dec 14, 2023 •

edited

Loading

sbillig commented Mar 28, 2024

Y-Nak left a comment •

edited

Loading

Y-Nak commented Apr 5, 2024

sbillig commented Apr 5, 2024 •

edited

Loading

Y-Nak commented Apr 6, 2024

sbillig commented Apr 7, 2024

Y-Nak left a comment

Parser recovery rework #972

Parser recovery rework #972

Conversation

sbillig commented Dec 14, 2023 • edited Loading

What was wrong?

How was it fixed?

To-Do

sbillig commented Mar 28, 2024

Y-Nak left a comment • edited Loading

Choose a reason for hiding this comment

Y-Nak commented Apr 5, 2024

sbillig commented Apr 5, 2024 • edited Loading

Y-Nak commented Apr 6, 2024

sbillig commented Apr 7, 2024

Y-Nak left a comment

Choose a reason for hiding this comment

sbillig commented Dec 14, 2023 •

edited

Loading

Y-Nak left a comment •

edited

Loading

sbillig commented Apr 5, 2024 •

edited

Loading