Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser recovery rework #972

Merged
merged 6 commits into from
Apr 12, 2024
Merged

Conversation

sbillig
Copy link
Collaborator

@sbillig sbillig commented Dec 14, 2023

What was wrong?

How was it fixed?

To-Do

  • change "Inherited/Override" recovery method stuff
  • Clean up commit history

@Y-Nak Y-Nak changed the base branch from master to fe-v2 December 14, 2023 17:25
@Y-Nak Y-Nak mentioned this pull request Jan 5, 2024
4 tasks
@sbillig sbillig force-pushed the parser-recovery-rework branch 2 times, most recently from 833ddc3 to 42202eb Compare March 1, 2024 21:12
@sbillig sbillig marked this pull request as ready for review March 28, 2024 05:21
@sbillig sbillig requested a review from Y-Nak March 28, 2024 05:21
@sbillig
Copy link
Collaborator Author

sbillig commented Mar 28, 2024

@Y-Nak I guess I'm done with this. It's not perfect, but I think it's better.

Copy link
Member

@Y-Nak Y-Nak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, overall. I think this PR generally improves error messages a lot. Thanks!

I think we still need Override or a similar way to prohibit the parser from using the parent scope's recovery tokens, so that error recovery doesn't change the semantics of the user's code.

e.g.,

impl Foo {
    struct S {}
    fn foo(x: i32) { 
    1 
    }
}

Root@0..67
  ItemList@0..67
    Item@0..15
      Impl@0..15
        ...
    Item@15..26
      Struct@15..26
        ...
    Item@31..65
      Func@31..65
        ...

In this example, struct S and fn foo are parsed in the outer item list scope, which will probably cause a negative impact on the quality of the diagnostics from the later analysis phases.

crates/hir-analysis/src/ty/def_analysis.rs Outdated Show resolved Hide resolved
crates/parser2/src/ast/item.rs Outdated Show resolved Hide resolved
crates/parser2/src/parser/mod.rs Outdated Show resolved Hide resolved
crates/parser2/src/parser/mod.rs Outdated Show resolved Hide resolved
crates/parser2/src/parser/mod.rs Outdated Show resolved Hide resolved
crates/parser2/src/parser/param.rs Outdated Show resolved Hide resolved
crates/parser2/src/parser/param.rs Outdated Show resolved Hide resolved
@Y-Nak
Copy link
Member

Y-Nak commented Apr 5, 2024

This is probably related to how to handle opening/closing brackets thing since these bracket kinds make a semantic scope that should break the inheritance of the recovery token generally, I guess.

@sbillig
Copy link
Collaborator Author

sbillig commented Apr 5, 2024

I think we still need Override or a similar way to prohibit the parser from using the parent scope's recovery tokens, so that error recovery doesn't change the semantics of the user's code.

Preventing the parser from using the parent scope's recovery tokens would change the semantics of this code:

impl Foo {

struct S {}
fn foo(x: i32) {
1
}

(ie the user didn't close the impl block, or is in the process of typing it).

I think it's impossible to avoid misinterpreting the semantics of the user's code in all cases. I think the "best" solution is to use the indentation level as a hint about the user's intention. In your example, the indentation implies that the struct is being defined inside the impl block, and could thus be recognized as disallowed. That said, I don't currently have the energy to implement such a thing.

Lacking indentation hints, when the parser finds an unexpected token which is an explicit recovery token in a parent scope, I think the most reasonable thing to do is to stop trying to parse the current scope and defer to the parent. If we allow the current scope to continue in the face of unexpected tokens, a single unclosed block could conceivably swallow the entire remainder of the file (unless we do something like treat some available closing brace as the end of the block, which can also introduce changes to semantics).

I chose to prioritize the case where the user is currently editing the file, or made a simple error (like a missing closing brace), over more complicated errors (like defining something where it's not allowed). The current implementation should handle the former reasonably well, but will change the semantics in the latter. Maybe a better error message could explain to the user that in your example the parser has decided that the struct keyword implies that the impl block is closed.

@Y-Nak
Copy link
Member

Y-Nak commented Apr 6, 2024

I'm sorry for the explanation that was too vague. I do understand that it's essentially a heuristic issue, regardless of how much we delve into it. What has particularly caught my attention is the aggressiveness of an error recovery. Of course, there's a degree of subjectivity in determining which recovery are considered "too" aggressive, though.

The following example concerns me especially:

trait Trait {
    type Type;
    fn foo(x: i32);
}

I suppose Rust users would often make this mistake in Fe. However, if the parser interprets type Type and `fn foo' as being defined at the parent module scope, the error message could be quite misleading.
This problem could be resolved by simplifying the BNF that the parser refers to as a specification, as you previously mentioned. Simplifying the BNF would likely allow each parsing scope to define more syntax kinds for error recovery. Semantic errors could be addressed by the later analysis phase.

As another problem, regarding unclosed scopes, I'm starting to feel that special handling will be necessary for this case. Essentially, I think it'd be better to (conceptually) give up parsing when the code is in unclosed scopes. This is because guessing the user's original intent becomes particularly challenging in this case, and aggressive error recovery could lead to misleading error messages.
This problem might be related to the error refinement idea because it brings another complexity if we try to let the parser determine whether a scope is properly closed while parsing the internal syntax elements of the scope.
This is related to the ADT recursion problem as well. That is, the more we try to componentize the compiler, the more difficult it is to emit better error messages when an error is related to another error emitted from another compiler component.
So, before outputting accumulated error messages, we might need a phase that refines the accumulated errors. For example, if the spans of two errors overlap, only one of the errors is selected to be output(at least, the output of one error should be suppressed if it is encompassed by another).
In the parser case, each parsing scope could just accumulate errors without caring about whether the scope is closing or not; then when the parser notices that the scope is not closed, the child node can be wrapped up by the error node. After that, the error refinement phase would suppress the errors that were accumulated inside the unclosed scope.

So, basically, I feel that we probably need to

  1. make BNF simpler
  2. wrap up an entire unclosed scope by an error syntax node
  3. add an error refinement phase

But I'm unsure if this is a proper direction. I'd really appreciate your opinion.
NOTE: I do not think these problems should be resolved in this PR either way. This PR has already resolved many other problems and significantly improved the quality of the parser implementation. I just want to deepen our understanding of these issues for the future.

@sbillig
Copy link
Collaborator Author

sbillig commented Apr 7, 2024

The following example concerns me especially:

trait Trait {
    type Type;
    fn foo(x: i32);
}

I suppose Rust users would often make this mistake in Fe. However, if the parser interprets type Type and `fn foo' as being defined at the parent module scope, the error message could be quite misleading. This problem could be resolved by simplifying the BNF that the parser refers to as a specification, as you previously mentioned. Simplifying the BNF would likely allow each parsing scope to define more syntax kinds for error recovery. Semantic errors could be addressed by the later analysis phase.

One easy option is to explicitly add parser support for cases that we feel might be common errors, and/or for syntax that we intend to support someday. (Like the fn-def-inside-struct case that you implemented.) In this situation, I think it would be maybe be ideal to recover to the parent scope on the type keyword if it's not indented.

In the parser case, each parsing scope could just accumulate errors without caring about whether the scope is closing or not; then when the parser notices that the scope is not closed, the child node can be wrapped up by the error node. After that, the error refinement phase would suppress the errors that were accumulated inside the unclosed scope.

How does the parser notice that the scope is not closed? The only way it can be certain is if it reaches the end of the file and failed to find a matching closing bracket, which of course means that the scope has consumed the rest of the file. This seems like a pretty poor outcome to me.

The alternative of course is to guess the user's intent, and I think we can do pretty well by considering indentation and looking for keywords that definitely aren't allowed (or likely to be mistakenly used) in a given scope. We could make this nicer by explicitly telling the user when the parser has considered an unclosed scope to be closed. (I'm just repeating myself, because I can't think of any better ideas :)

I think it's actually ok to allow for the possibility of misleading errors (like every compiler I've ever used), in favor of nice behavior most of the time. Perhaps the parser could behave differently in LSP mode vs CLI mode, with the latter being more strict?

Perhaps we could mark nodes that are in a questionably-parsed region (not sure how to define this) as "tainted", and suppress all errors that involve them? (Eg in the previous crappy parser/analyzer, we suppressed "item not found" errors relating to files that didn't parse successfully, assuming that perhaps the missing item wasn't parsed.)

Copy link
Member

@Y-Nak Y-Nak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sbillig sbillig merged commit 57335f1 into ethereum:fe-v2 Apr 12, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants