Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Error handling #116

Merged
merged 10 commits into from
May 17, 2021
Merged
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ Released: TBD
from the `options.grammarSource` property. That property can contain arbitrary
data,for example, path to the currently parsed file.
[@Mingun](https://github.com/peggyjs/peggy/pull/95)
- Made usage of `GrammarError` and `peg$SyntaxError` more consistent. Use the
`format` method to get pretty string outputs. Updated the `peggy` binary to
make pretty errors. Slight breaking change: the format of a few error
messages have changed; use the `toString()` method on `GrammarError` to get
something close to the old text.
[@hildjj](https://github.com/peggyjs/peggy/pull/116)

### Bug fixes

Expand Down
73 changes: 67 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -541,8 +541,10 @@ expression matches, consider the match failed.
As described above, you can annotate your grammar rules with human-readable
names that will be used in error messages. For example, this production:

integer "integer"
= digits:[0-9]+
```peggy
integer "integer"
= digits:[0-9]+
```

will produce an error message like:

Expand All @@ -562,23 +564,82 @@ subexpressions.

For example, for this rule matching a comma-separated list of integers:

seq
= integer ("," integer)*
```peggy
seq
= integer ("," integer)*
```

an input like `1,2,a` produces this error message:

> Expected integer but "a" found.

But if we add a human-readable name to the `seq` production:

seq "list of numbers"
= integer ("," integer)*
```peggy
seq "list of numbers"
= integer ("," integer)*
```

then Peggy prefers an error message that implies a smaller attempted parse
tree:

> Expected end of input but "," found.

There are two classes of errors in Peggy:

- `SyntaxError`: Syntax errors, found during parsing the input. This kind of
errors can be thrown both during _grammar_ parsing and during _input_ parsing.
Although name is the same, errors of each generated parser (including Peggy
parser itself) has its own unique class.
- `GrammarError`: Grammar errors, found during construction of the parser.
That errors can be thrown only on parser generation phase. This error
signals about logical mistake in the grammar, such as having rules with
the same name in one grammar, etc.

Whatever error has caught, both of them have the `format()` method that takes
an array of mappings from source to grammar text:

```javascript
let source = ...;
try {
PEG.generate(input, { grammarSource: source, ...});// throws SyntaxError or GrammarError
parser.parse(input, { grammarSource: source, ...});// throws SyntaxError
} catch (e) {
if (typeof e.format === "function") {
console.log(e.format([
{ source, text: input },
{ source: source2, text: input2 },
...
]));
}
}
```

Generated message looks like:

```console
Error: Possible infinite loop when parsing (left recursion: start -> proxy -> end -> start)
--> .\recursion.pegjs:1:1
|
1 | start = proxy;
| ^^^^^
note: Step 1: call of the rule "proxy" without input consumption
--> .\recursion.pegjs:1:9
|
1 | start = proxy;
| ^^^^^
note: Step 2: call of the rule "end" without input consumption
--> .\recursion.pegjs:2:11
|
2 | proxy = a:end { return a; };
| ^^^
note: Step 3: call itself without input consumption - left recursion
--> .\recursion.pegjs:3:8
|
3 | end = !start
| ^^^^^
```

## Compatibility

Both the parser generator and generated parsers should run well in the following
Expand Down
10 changes: 7 additions & 3 deletions bin/peggy
Original file line number Diff line number Diff line change
Expand Up @@ -289,8 +289,9 @@ if (inputFile === "-") {
process.stdin.resume();
inputStream = process.stdin;
inputStream.on("error", () => {
abort("Can't read from file \"" + inputFile + "\".");
abort("Can't read from stdin.");
});
options.grammarSource = "stdin";
} else {
options.grammarSource = inputFile;
inputStream = fs.createReadStream(inputFile);
Expand All @@ -311,8 +312,11 @@ readStream(inputStream, input => {
try {
source = peg.generate(input, options);
} catch (e) {
if (e.location !== undefined) {
abort(e.location.start.line + ":" + e.location.start.column + ": " + e.message);
if (typeof e.format === "function") {
abort(e.format([{
source: options.grammarSource,
text: input
}]));
} else {
abort(e.message);
}
Expand Down
87 changes: 84 additions & 3 deletions docs/documentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ <h2 id="table-of-contents">Table of Contents</h2>
<li><a href="#parsing-lists">Parsing Lists</a></li>
</ul>
</li>
<li><a href="#error-messages">Error Messages</a></li>
<li><a href="#compatibility">Compatibility</a></li>
</ul>

Expand Down Expand Up @@ -252,7 +253,10 @@ <h2 id="using-the-parser">Using the Parser</h2>
result (the exact value depends on the grammar used to generate the parser) or
throw an exception if the input is invalid. The exception will contain
<code>location</code>, <code>expected</code>, <code>found</code> and
<code>message</code> properties with more details about the error.</p>
<code>message</code> properties with more details about the error. The error
will have a <code>format(SourceText[])</code> function, to which you pass an array
of objects that look like <code>{source: grammarSource, text: string}</code>; this
will return a nicely-formatted error suitable for human consumption.</p>

<pre><code>parser.parse("abba"); // returns ["a", "b", "b", "a"]

Expand Down Expand Up @@ -689,14 +693,91 @@ <h3 id="parsing-lists">Parsing Lists</h3>
<p>Note that the <code>@</code> in the tail section plucks the word out of the
parentheses, NOT out of the rule itself.</p>

<h2 id="error-messages">Error Messages</h2>
<p>As described above, you can annotate your grammar rules with human-readable names that will be used in error messages. For example, this production:</p>

<pre><code>integer "integer"
= digits:[0-9]+</code></pre>
<p>will produce an error message like:</p>

Expected integer but "a" found.

<p>when parsing a non-number, referencing the human-readable name "integer." Without the human-readable name, Peggy instead uses a description of the character class that failed to match:</p>

Expected [0-9] but "a" found.

<p>Aside from the text content of messages, human-readable names also have a subtler effect on where errors are reported. Peggy prefers to match named rules completely or not at all, but not partially. Unnamed rules, on the other hand, can produce an error in the middle of their subexpressions.</p>

<p>For example, for this rule matching a comma-separated list of integers:</p>

<pre><code>seq
= integer ("," integer)*</code></pre>
<p>an input like 1,2,a produces this error message:</p>

<blockquote>Expected integer but "a" found.</blockquote>

<p>But if we add a human-readable name to the seq production:</p>

<pre><code>seq "list of numbers"
= integer ("," integer)*</code></pre>
<p>then Peggy prefers an error message that implies a smaller attempted parse tree:</p>

<blockquote>Expected end of input but "," found.</blockquote>

<p>There are two classes of errors in Peggy:</p>

<ul>
<li><code>SyntaxError</code> Syntax errors, found during parsing the input. This kind of errors can be thrown both during <em>grammar</em> parsing and during <em>input</em> parsing. Although name is the same, errors of each generated parser (including Peggy parser itself) has its own unique class.</li>
<li><code>GrammarError</code>: Grammar errors, found during construction of the parser. That errors can be thrown only on parser generation phase. This error signals about logical mistake in the grammar, such as having rules with the same name in one grammar, etc.</li>
</ul>

<p>Whatever error has caught, both of them have the <code>format()</code> method that takes an array of mappings from source to grammar text:</p>

<pre><code>let source = ...;
try {
PEG.generate(input, { grammarSource: source, ...});// throws SyntaxError or GrammarError
parser.parse(input, { grammarSource: source, ...});// throws SyntaxError
} catch (e) {
if (typeof e.format === "function") {
console.log(e.format([
{ source, text: input },
{ source: source2, text: input2 },
...
]));
}
}</code></pre>

<p>Generated message looks like:</p>

<pre><code>Error: Possible infinite loop when parsing (left recursion: start -> proxy -> end -> start)
--> .\recursion.pegjs:1:1
|
1 | start = proxy;
| ^^^^^
note: Step 1: call of the rule "proxy" without input consumption
--> .\recursion.pegjs:1:9
|
1 | start = proxy;
| ^^^^^
note: Step 2: call of the rule "end" without input consumption
--> .\recursion.pegjs:2:11
|
2 | proxy = a:end { return a; };
| ^^^
note: Step 3: call itself without input consumption - left recursion
--> .\recursion.pegjs:3:8
|
3 | end = !start
| ^^^^^</code></pre>

<h2 id="compatibility">Compatibility</h2>

<p>Both the parser generator and generated parsers should run well in the
following environments:</p>

<ul>
<li>Node.js 0.10.0+</li>
<li>Internet Explorer 8+</li>
<li>Node.js 4+</li>
<li>Internet Explorer 9+</li>
<li>Edge</li>
<li>Firefox</li>
<li>Chrome</li>
Expand Down
2 changes: 1 addition & 1 deletion docs/js/benchmark-bundle.min.js

Large diffs are not rendered by default.

60 changes: 30 additions & 30 deletions docs/js/test-bundle.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/vendor/peggy/peggy.min.js

Large diffs are not rendered by default.

54 changes: 46 additions & 8 deletions lib/compiler/passes/generate-js.js
Original file line number Diff line number Diff line change
Expand Up @@ -477,19 +477,57 @@ function generateJS(ast, options) {
"}",
"",
"function peg$SyntaxError(message, expected, found, location) {",
" this.message = message;",
" this.expected = expected;",
" this.found = found;",
" this.location = location;",
" this.name = \"SyntaxError\";",
"",
" if (typeof Error.captureStackTrace === \"function\") {",
" Error.captureStackTrace(this, peg$SyntaxError);",
" var self = Error.call(this, message);",
" if (Object.setPrototypeOf) {",
" Object.setPrototypeOf(self, peg$SyntaxError.prototype);",
" }",
" self.expected = expected;",
" self.found = found;",
" self.location = location;",
" self.name = \"SyntaxError\";",
" return self;",
"}",
"",
"peg$subclass(peg$SyntaxError, Error);",
"",
"function peg$padEnd(str, targetLength, padString) {",
" padString = padString || \" \";",
" if (str.length > targetLength) { return str; }",
" targetLength -= str.length;",
" padString += padString.repeat(targetLength);",
" return str + padString.slice(0, targetLength);",
"}",
"",
"peg$SyntaxError.prototype.format = function(sources) {",
" var str = \"Error: \" + this.message;",
" if (this.location) {",
" var src = null;",
" var k;",
" for (k = 0; k < sources.length; k++) {",
" if (sources[k].source === this.location.source) {",
" src = sources[k].text.split(/\\r\\n|\\n|\\r/g);",
" break;",
" }",
" }",
" var s = this.location.start;",
" var loc = this.location.source + \":\" + s.line + \":\" + s.column;",
" if (src) {",
" var e = this.location.end;",
" var filler = peg$padEnd(\"\", s.line.toString().length);",
" var line = src[s.line - 1];",
" var last = s.line === e.line ? e.column : line.length + 1;",
" str += \"\\n --> \" + loc + \"\\n\"",
" + filler + \" |\\n\"",
" + s.line + \" | \" + line + \"\\n\"",
" + filler + \" | \" + peg$padEnd(\"\", s.column - 1)",
" + peg$padEnd(\"\", last - s.column, \"^\");",
" } else {",
" str += \"\\n at \" + loc;",
" }",
" }",
" return str;",
"};",
"",
"peg$SyntaxError.buildMessage = function(expected, found) {",
" var DESCRIBE_EXPECTATION_FNS = {",
" literal: function(expectation) {",
Expand Down
12 changes: 7 additions & 5 deletions lib/compiler/passes/report-duplicate-labels.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,18 @@ function reportDuplicateLabels(ast) {
const label = node.label;
if (label && Object.prototype.hasOwnProperty.call(env, label)) {
throw new GrammarError(
"Label \"" + node.label + "\" is already defined "
+ "at line " + env[label].start.line + ", "
+ "column " + env[label].start.column + ".",
Comment on lines -39 to -41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a useful information that should be kept. Some other messages just put there their own location, that is useless.

Ideally, this information also should be presented in the error object as a separate field, but right now I can't imagine which format would be better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be reintroduced. However, I also prefer the basic sentiment that the underlying message should be short.

I propose:

  1. A "shortMessage" field be introduced
  2. The full name be constructed by a function from the various fields
  3. The following arguments be added to the constructor, then promoted as fields in the downstream product:
    1. start (object)
      1. line (number)
      2. column (number)
      3. offset (number)
    2. end (object)
      1. line (number)
      2. column (number)
      3. offset (number)
    3. length (number)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply that the string being passed to GrammarError is actually the shortMessage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to do the work if we land this without this piece

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a later commit, I added a toString(), the codeLocation bits that @Mingun wanted, and added a referenceLocation field for the errors that say "you have a problem here that references a thing there", e.g. duplicate labels. I'm open to length, but not sure what value it adds. From pegjs-util, I really like found, and wouldn't mind the prolog, token, and epilog fields if they end up being helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following arguments be added to the constructor, then promoted as fields in the downstream product:

GrammarError already have a location that have more full information.

From pegjs-util, I really like found, and wouldn't mind the prolog, token, and epilog fields if they end up being helpful.

found is a part of SyntaxError already and the mentioned method processes that kind of errors. I don't think that prolog, token, and epilog fields will be useful, but we can provide method in the SyntaxError that extracts any token from input stream and error position just by executing specified "errorTokenRule" rule:

class SyntaxError extends Error {
  /// @param errorTokenRule?
  ///        Rule to extract logical token with error.
  ///        If not specified, grammar's default will be used
  ///        That default can be specified or in options,
  ///        or via annotations when annotations landed
  errorToken(input, errorTokenRule) {
    // Something like that
    return parser.parse(
      input.substr(this.location.start.offset),
      { startRule: errorTokenRule }
    );
  }
}

node.location
`Label "${node.label}" is already defined`,
node.labelLocation,
[{
message: "Original label location",
location: env[label]
}]
);
}

check(node.expression, env);

env[node.label] = node.location;
env[node.label] = node.labelLocation;
},

text: checkExpressionWithClonedEnv,
Expand Down
12 changes: 7 additions & 5 deletions lib/compiler/passes/report-duplicate-rules.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,16 @@ function reportDuplicateRules(ast) {
rule(node) {
if (Object.prototype.hasOwnProperty.call(rules, node.name)) {
throw new GrammarError(
"Rule \"" + node.name + "\" is already defined "
+ "at line " + rules[node.name].start.line + ", "
+ "column " + rules[node.name].start.column + ".",
Comment on lines -14 to -16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

node.location
`Rule "${node.name}" is already defined`,
node.nameLocation,
[{
message: "Original rule location",
location: rules[node.name]
}]
);
}

rules[node.name] = node.location;
rules[node.name] = node.nameLocation;
}
});

Expand Down
12 changes: 7 additions & 5 deletions lib/compiler/passes/report-incorrect-plucking.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,19 @@ const visitor = require("../visitor");
function reportIncorrectPlucking(ast) {
const check = visitor.build({
action(node) {
check(node.expression, true);
check(node.expression, node);
},

labeled(node, action) {
if (node.pick) {
if (action) {
throw new GrammarError(
"\"@\" cannot be used with an action block "
+ "at line " + node.location.start.line + ", "
+ "column " + node.location.start.column + ".",
Comment on lines -21 to -23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should replace self location with action node location, but there is a subtle aspect. action node location points to the expression and associated code block and it's start position visually not related to the code block at all. We have two options here, how to solve that:

  • change location of the action node. I think that is bad, because:
    • that is breaking change
    • that breaks assumption that locations of all children nodes lives in the location of their parent node
  • introduce a new codeLocation property, which should hold the CodeBlock location, i.e. space from opening { till closing }. That properties should be added to all nodes which have a CodeBlock. For me, that is the best choice

After introducing that property just use action.codeLocation.start there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, codeLocation would used for source map support

Copy link
Contributor Author

@hildjj hildjj Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't the code block have its own AST node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it have -- that is initializer, action, semantic_and, and semantic_not

Copy link
Contributor Author

@hildjj hildjj Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that the ast should be:

{
  type: 'semantic_and',
  expression: {
    type: 'code',
    code: ' return true; ',
    location: ...
  },
  location: ...
}

instead of:

{
  type: 'semantic_and',
  code: ' return true; ',
  location: ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be hard to process them if use the visitor pattern as with other kinds of nodes. You always need information from their parent node to process it. So you have to:

  • or give each code node different name -- but then this situation wouldn't differ from present
  • or not use visitor pattern when process such nodes -- then an extra entity will give nothing useful
  • or give a reference to the owner node down to the visitor so code nodes known which data their represents

I think, that all alternatives not gives us so much and just a new property with an extra location will be enough:

{
  type: 'semantic_and',
  code: ' return true; ',
  codeLocation: ..., // for initializer just the same reference, as location
  location: ...
}

node.location
"\"@\" cannot be used with an action block",
node.labelLocation,
[{
message: "Action block location",
location: action.codeLocation
}]
);
}
}
Expand Down
Loading