Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better parser error recovery #1086

Open
4 of 29 tasks
degory opened this issue Feb 23, 2024 · 0 comments
Open
4 of 29 tasks

Better parser error recovery #1086

degory opened this issue Feb 23, 2024 · 0 comments

Comments

@degory
Copy link
Owner

degory commented Feb 23, 2024

Lots of common syntax errors cause huge cascades of errors due to the parser losing its place in the subsequent input.

When we detect a syntax error, we can take advantage of the variety of block terminating keywords to try to re-synchronize the parser with the block structure of the input. We can also use indentation as a hint in error recovery.

  • Global function not terminated with si
  • Global property body not terminated with si
  • Global property with empty type
  • Class, trait or struct not terminated with si
  • Method not terminated with si
  • Method signature incomplete arguments
  • Method signature incomplete type
  • Property body not terminated with si
  • Attempt to nest a named function inside a global function
  • Attempt to nest a named function inside a global property
  • Attempt to nest a named function inside a method
  • Attempt to nest a named function inside a property
  • Missing colon between argument name and type
  • Variable definition with : but no let
  • if with no closing fi
  • elif with no expression and/or then
  • Orphaned else ... fi block
  • Orphaned fi
  • for with no od
  • while with no od
  • do with no od
  • Orphaned 'od`
  • try with no 'yrt`
  • Orphaned 'catch`
  • Orphaned 'finally`
  • Orphaned yrt
  • Missing comma between actual arguments in function call
  • Unclosed generic type argument brackets [ ]
  • use inside class

Recovery strategies:

Apparently nested functions

When encountering an apparent attempt to nest a named function inside something it may be (and is probably more likely that) the actual error is a missing si on the preceding definition. We can use a simple heuristic to determine which:

If the apparently nested function is indented the same amount as the preceding global definition, it's very likely intended to be a global definition, and the actual error was a missing si
In this case we need to:

  • Report error for the missing si
  • Roll back to before the nested function definition
  • Unwind to the global definition parser
  • Restart the parser

If the apparently nested function is indented the same amount as the preceding class, trait or struct member, it's very likely intended to be a method

  • Report error for the missing si
  • Roll back to before the nested function definition
  • Unwind to the class/trait/struct parser
  • Restart the parser

If the apparently nested function is indented more than the enclosing class member, it may be a genuine attempt to nest a named method

  • Report an error for the wrongly nested named function
  • Carry on parsing it
  • But throw away the resulting parse tree

Incomplete function signatures

If we detect the beginning of second function signature when we're parsing function formal arguments or type, then the first function signature is probably incomplete, especially if the first function signature is all on one line and the second signature is on a different line.

@degory degory converted this from a draft issue Feb 23, 2024
degory added a commit that referenced this issue Feb 26, 2024
Enhancements:
- Better parser error recovery for incomplete and garbled class, method and property definitions (see #1086)

Technical:
- Rewrite the tokenizer lookahead mechanism so it reliably supports multiple levels of speculation
- Replace use of string concatenation with interpolation throughout the compiler source
degory added a commit that referenced this issue Feb 26, 2024
Enhancements:
- Better parser error recovery for incomplete and garbled class, method and property definitions (see #1086)

Technical:
- Rewrite the tokenizer lookahead mechanism so it reliably supports multiple levels of speculation
- Replace use of string concatenation with interpolation throughout the compiler source
degory added a commit that referenced this issue Mar 4, 2024
Enhancements:
- Improved error reporting and parser recovery for incomplete function signatures (see #1086)
degory added a commit that referenced this issue Mar 4, 2024
Enhancements:
- Improved error reporting and parser recovery for incomplete function signatures (see #1086)
degory added a commit that referenced this issue Mar 11, 2024
Enhancements:
- If a cascade of errors is detected and syntax errors are present, show only the first 15 syntax errors per file (see #1086)
degory added a commit that referenced this issue Mar 11, 2024
Enhancements:
- If a cascade of errors is detected and syntax errors are present, show only the first 15 syntax errors per file (see #1086)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant