-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Error handling #116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may break a few people who are relying on error formatting. I want to make sure we've discussed that adequately. Do we need to take a major version number bump for this?
lib/compiler/passes/generate-js.js
Outdated
" Error.captureStackTrace(this, peg$SyntaxError);", | ||
" }", | ||
" const self = Error.call(this, message);", | ||
" Object.setPrototypeOf(self, peg$SyntaxError.prototype);", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://webreflection.co.uk/blog/2016/08/30/js-super-problem
What I want is for peg$syntaxError
to be a real subclass of Error, with the constructor having been called correctly. Below this you see a call to peg$subclass
which only does part of that.
@@ -39,7 +39,7 @@ function reportInfiniteRecursion(ast) { | |||
throw new GrammarError( | |||
"Possible infinite loop when parsing (left recursion: " | |||
+ visitedRules.join(" -> ") | |||
+ ").", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also changed the error texts to be consistent in punctuation. I flipped a coin and got "no period at the end"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That points looks alien
@@ -1,15 +1,11 @@ | |||
"use strict"; | |||
|
|||
// Thrown when the grammar contains an error. | |||
class GrammarError { | |||
class GrammarError extends Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modern version of the change to peg$syntaxError
"Label \"" + node.label + "\" is already defined " | ||
+ "at line " + env[label].start.line + ", " | ||
+ "column " + env[label].start.column + ".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a useful information that should be kept. Some other messages just put there their own location, that is useless.
Ideally, this information also should be presented in the error object as a separate field, but right now I can't imagine which format would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this should be reintroduced. However, I also prefer the basic sentiment that the underlying message should be short.
I propose:
- A "shortMessage" field be introduced
- The full name be constructed by a function from the various fields
- The following arguments be added to the constructor, then promoted as fields in the downstream product:
- start (object)
- line (number)
- column (number)
- offset (number)
- end (object)
- line (number)
- column (number)
- offset (number)
- length (number)
- start (object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would imply that the string being passed to GrammarError
is actually the shortMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to do the work if we land this without this piece
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a later commit, I added a toString()
, the codeLocation
bits that @Mingun wanted, and added a referenceLocation
field for the errors that say "you have a problem here that references a thing there", e.g. duplicate labels. I'm open to length, but not sure what value it adds. From pegjs-util, I really like found
, and wouldn't mind the prolog
, token
, and epilog
fields if they end up being helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following arguments be added to the constructor, then promoted as fields in the downstream product:
GrammarError
already have a location
that have more full information.
From pegjs-util, I really like
found
, and wouldn't mind theprolog
,token
, andepilog
fields if they end up being helpful.
found
is a part of SyntaxError
already and the mentioned method processes that kind of errors. I don't think that prolog
, token
, and epilog
fields will be useful, but we can provide method in the SyntaxError
that extracts any token from input stream and error position just by executing specified "errorTokenRule" rule:
class SyntaxError extends Error {
/// @param errorTokenRule?
/// Rule to extract logical token with error.
/// If not specified, grammar's default will be used
/// That default can be specified or in options,
/// or via annotations when annotations landed
errorToken(input, errorTokenRule) {
// Something like that
return parser.parse(
input.substr(this.location.start.offset),
{ startRule: errorTokenRule }
);
}
}
"Rule \"" + node.name + "\" is already defined " | ||
+ "at line " + rules[node.name].start.line + ", " | ||
+ "column " + rules[node.name].start.column + ".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
"\"@\" cannot be used with an action block " | ||
+ "at line " + node.location.start.line + ", " | ||
+ "column " + node.location.start.column + ".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should replace self location with action node location, but there is a subtle aspect. action
node location
points to the expression and associated code block and it's start
position visually not related to the code block at all. We have two options here, how to solve that:
- change
location
of the action node. I think that is bad, because:- that is breaking change
- that breaks assumption that locations of all children nodes lives in the location of their parent node
- introduce a new
codeLocation
property, which should hold theCodeBlock
location, i.e. space from opening{
till closing}
. That properties should be added to all nodes which have aCodeBlock
. For me, that is the best choice
After introducing that property just use action.codeLocation.start
there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, codeLocation
would used for source map support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't the code block have its own AST node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it have -- that is initializer
, action
, semantic_and
, and semantic_not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting that the ast should be:
{
type: 'semantic_and',
expression: {
type: 'code',
code: ' return true; ',
location: ...
},
location: ...
}
instead of:
{
type: 'semantic_and',
code: ' return true; ',
location: ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it would be hard to process them if use the visitor pattern as with other kinds of nodes. You always need information from their parent node to process it. So you have to:
- or give each
code
node different name -- but then this situation wouldn't differ from present - or not use visitor pattern when process such nodes -- then an extra entity will give nothing useful
- or give a reference to the owner node down to the visitor so
code
nodes known which data their represents
I think, that all alternatives not gives us so much and just a new property with an extra location will be enough:
{
type: 'semantic_and',
code: ' return true; ',
codeLocation: ..., // for initializer just the same reference, as location
location: ...
}
lib/compiler/passes/generate-js.js
Outdated
" const self = Error.call(this, message);", | ||
" Object.setPrototypeOf(self, peg$SyntaxError.prototype);", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now generated parser is still a ES5 compatible code. As I remember, const
available only from ES6, the same for Object.setPrototypeOf
.
So this is a breaking change. I do not mind, because I myself plan to do some PRs that are likely to be breaking.
Just do not forget to update also a list of supported targets in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const->var is an easy fix that was just carelessness on my part. Object.setPrototypeOf
is supported in IE11+, and I think we're still trying to support IE9 for the moment. We can add a polyfill, or just accept slightly-incorrect behavior if Object.setPrototypeOf
doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"slightly-incorrect" here looks like this:
Uncaught [Error [SyntaxError]: ... ]
instead of this:
[peg$SyntaxError: ... ]
which i think i can live with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this polyfill isn't actually possible
I've made React 15 work in IE7
This is a much uglier thing than you might expect
As a point, does it really matter? If they weren't getting a nice class inheritance before, there's no sense establishing a backwards compatibility they never had in the first place
As long as it doesn't catch fire in new and unique ways, "compatability" can in their case be relegated to "eh, good enough" IMO. If it was already broken, we don't have to fix it
@@ -39,7 +39,7 @@ function reportInfiniteRecursion(ast) { | |||
throw new GrammarError( | |||
"Possible infinite loop when parsing (left recursion: " | |||
+ visitedRules.join(" -> ") | |||
+ ").", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That points looks alien
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things can be done better. I've leave inline comments
@Mingun look again please. If this works for you, I'll rebase into better chunks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, but you can make it even better. See the inline comments. Not strictly required although
lib/grammar-error.js
Outdated
@@ -2,10 +2,29 @@ | |||
|
|||
// Thrown when the grammar contains an error. | |||
class GrammarError extends Error { | |||
constructor(message, location) { | |||
constructor(message, location, referenceLocation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better to pass an array with message and location for additional notes to the main error, like in the Rust Diagnostic API. This opens door for the great errors reports, for example, the report-left-recursion
pass would be to point to all locations involved in the left recursion. The error reporting code would look like:
throw new GrammarError(
"Possible infinite loop when parsing",// message
error_location, // location
visitedRules.map(rule => { // notes[]
return {
message: `left recursion via rule "${rule.name}"`,
location: rule.location
};
})
);
This error could be rendered, for example, like a Rust error:
error: Possible infinite loop when parsing
--> grammar.pegjs:2:1
|
2 | left_recursion = ...;
| ^^^^^^^^^^^^^^
|
note: left recursion via rule "one_rule"
--> grammar.pegjs:4:1
|
4 | one_rule = ...;
| ^^^^^^^^
note: left recursion via rule "another_rule"
--> grammar.pegjs:6:1
|
6 | another_rule = ...;
| ^^^^^^^^^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this, and am going to make the change. What do you think of ...locations
as the second parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean
constructor(message: string, locations: LocationRange[], notes: DiagnosticNote[])
type DiagnosticNote{
message: string;
location: LocationRange;
}
?
No, I'm thinking about
constructor(message: string, location?: LocationRange, notes?: DiagnosticNote[])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of just constructor(message, ...locations)
, and locations could have diagnostic notes if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will not cover all possible variants.
Main message should be always present. location
could be missing, for example, check, that allowedStartRules
contains only declared rules (pegjs/pegjs#532, I should port that stuff). notes
available not for all diagnostics.
If you thinking about adding message
field into LocationRange
itself, I think this wrong intention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about adding an optional message to LocationRange, yes. But I agree, that's a separate change. Let me finish off what I have, then let's take a look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spreading a container requires that to be varargs forever. if you decide you want to add something else afterwards, you can't.
better to just pass locations as a container imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, you remove the GrammarError.location
property, this is breaking change.
I thinking about this a bit. We can unify my and you approaches -- make the GrammarError
accept just an array of DiagnosticNote
objects. In order to keep backward compatibility I'm suggest the following:
- Currently implement
GrammarError
as I suggested:That keeps the backward compatibilityclass GrammarError { constructor(message: string, location?: LocationRange, diagnostics?: ...DiagnosticNote) {} } // usage: throw new GrammarError("main message"); throw new GrammarError("main message", main_location); throw new GrammarError( "main message", main_location, { message: "additional note message", location: note_location } ); throw new GrammarError( "main message", null, { message: "additional note message 1", location: note_location }, { message: "additional note message 2" } );
- In version 2.0 replace this to
class GrammarError { constructor(diagnostics?: ...DiagnosticNote) {} } // usage: throw new GrammarError({ message: "main message" }); throw new GrammarError({ message: "main message", location: main_location }); throw new GrammarError( { message: "main message", location: main_location }, { message: "additional note message", location: note_location } ); throw new GrammarError( { message: "main message" }, { message: "additional note message 1", location: note_location }, { message: "additional note message 2" } );
- Release 1.2.0 in the coming month, then release 2.0 soon
Maybe even keep both variants, because passing main message and their location are common case.
Anyway, the end-user is unlikely will deal with the GrammarError
constructor. I'll make a new PR in couple of days where I introduce the new way of reporting errors, that do not require you to throw exceptions. That will allow to emit several errors, warnings and information messages during compilation run.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, i just found the breaking change when I went to look at the .d.ts file. I've added a getter that should paper over the issue, just returning the first location. I'm willing to try implementing your API above and see how I feel about it. Let's see what I come up with.
If you're going to not be throwing errors, is this PR all moot, or are you planning to reuse any of it in your approach? I'd rather have multiple errors if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think it still useful, because in the end I'll throw the GrammarError
. Actual support of multiply errors in one compilation run will be possible only in 2.0, unfortunately, because that introduces a breaking change which I'm not see how to avoid :( .
lib/grammar-error.js
Outdated
const filename = (typeof this.location.source === "string") | ||
? `"${this.location.source}" ` | ||
: ""; | ||
str += `\nat ${filename}line ${this.location.start.line}, column ${this.location.start.column}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use <filename>:<line>:<column>
syntax -- VSCode understand it and can create hyperlinks to corresponding locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that too. Willfix.
lib/grammar-error.js
Outdated
if (this.referenceLocation) { | ||
const filename = (typeof this.referenceLocation.source === "string") | ||
? `"${this.referenceLocation.source}" ` | ||
: ""; | ||
str += `\nfrom ${filename}line ${this.referenceLocation.start.line}, column ${this.referenceLocation.start.column}`; | ||
} | ||
|
||
return str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary to follow my general suggestion above, because it will require access to the original string. Just enumerate errors with "note: <message><location>"
will be enough. But if you wish, you can implement the Rust format if source grammar string was given 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make that a separate method.
The biggest issue remaining is that the peg$syntaxError formats are now their old ugly selves. I can translate all of this code to es5 I guess. Opinions? |
Because in several issues we already faced with necessity to introduce a small breaking changes, I think, this can be done in the 2.0 release and we can get it relatively quickly. Release 1.2.0 within a month (I plan to make a several compatible PRs for it), and then change that in the 2.0. |
I'm fixing a few coverage misses and a bug that they pointed out. |
I think I've got the API the way we've discussed above. It's backward-compatible. I've added a |
Sample output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Just couple of small comments.
Also, I think it is better to place the note: ...
text before their location:
$ echo "foo = 'bar'\nfoo= ." | bin/peggy -
Error: Rule "foo" is already defined
--> stdin:2:1
|
2 | foo= .
| ^^^^^^
note: Original rule location
--> stdin:1:1
|
1 | foo = 'bar'
| ^^^^^^^^^^^
lib/grammar-error.js
Outdated
if (typeof this.location.source === "string") { | ||
str += `${this.location.source}:`; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we could allow any defined object here. No need to forbid this feature for objects that implements toString
protocol
if (typeof this.location.source === "string") { | |
str += `${this.location.source}:`; | |
} | |
if (this.location.source !== undefined && this.location.source !== null) { | |
str += `${this.location.source}:`; | |
} |
lib/grammar-error.js
Outdated
if (typeof diag.location.source === "string") { | ||
str += `${diag.location.source}:`; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paired with the previous
if (typeof diag.location.source === "string") { | |
str += `${diag.location.source}:`; | |
} | |
if (diag.location.source !== undefined && diag.location.source !== null) { | |
str += `${diag.location.source}:`; | |
} |
lib/grammar-error.js
Outdated
null, | ||
this.locations.map(loc => loc.start.line) | ||
).toString().length; | ||
const srcLines = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this correctly handle toString
, etc. keys? Maybe replace with Map
or Object.create(null)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this needs to change now. How about we just make it entry pairs. [[source, text], [source, text]]
. That way we won't need a Map shim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably more clear and easier to document to do [{source: "stdin", text: "input"}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There is not so much different sources, so linear search in the array won't a problem. Both, array of arrays or array of structures are acceptable
lib/peg.d.ts
Outdated
found?: any; | ||
expected?: ExpectedItem[]; | ||
stack?: any; | ||
toString(): string; | ||
format(sources:Sources): string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format(sources:Sources): string; | |
format(sources: Sources): string; |
@@ -25,7 +25,7 @@ describe("compiler pass |reportInfiniteRecursion|", function() { | |||
"start = stop", | |||
"stop = start" | |||
].join("\n"), { | |||
message: "Possible infinite loop when parsing (left recursion: start -> stop -> start).", | |||
message: "Possible infinite loop when parsing (left recursion: start -> stop -> start)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we can put each rule to the diagnostics
array (stop
in that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave that for a follow-up, but I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
src/parser.pegjs
Outdated
= "{" @BareCodeBlock "}" | ||
|
||
BareCodeBlock | ||
= expression:Code { return { type: "code", expression, location: location() }; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, type
is not used
= expression:Code { return { type: "code", expression, location: location() }; } | |
= expression:Code { return { expression, location: location() }; } |
a69730a
to
635c939
Compare
Fixed above nits, updated docs + CHANGELOG, squashed. Ready to land? |
5a8521e
to
3e87945
Compare
That is easy, just generate that error yourself: // Grammar
start = '' {
let end = { offset: input.length, line: 1, column: 1 };
let p = 0;
while (p < input.length) {
if (input.charCodeAt(p) === 10) {
end.line++;
end.column = 1;
} else {
end.column++;
}
p++;
}
error("error", {
source: "test",
start: { offset: 0, line: 1, column: 1 },
end
});
}; |
I wasn't specific enough. It needs to be a |
lib/parser.js
Outdated
str += peg$padEnd("", this.location.end.column - this.location.start.column, "^"); | ||
} | ||
else { | ||
str += peg$padEnd("", line.length - this.location.start.column + 1, "^"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the line I can't reach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start = @&{
multiline
};
triggers that. If my fork would be used, I've changed that so it points only to @
, therefore it will not work.
I believe, that squashing everything into one big commit is not a good practice. There could be difficulties in the incidents investigation, if their happens in the future. I've split your commit into small fine-grained commits and also implement a better messages for left-recursion detector. You can found that in my fork. If you wish, I can polish and finish you work. Also I slightly extend idea about Some examplesLeft recursion
Duplicated rules
Duplicated labels
Plucking together with an action
Plucking of a semantic predicate
|
lib/parser.js
Outdated
peg$SyntaxError.prototype.format = function(sources) { | ||
var str = "Error: " + this.message; | ||
if (this.location) { | ||
var srcLines = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable actually not used. I fix that in my fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is leftover from my port. good catch.
I think I'd like to take a crack at pulling my single commit into pieces, then we can see what is left from your fork that still needs to be included, if that works for you? |
Yes |
@Mingun after looking through your patch set in more detail, let's go ahead and take yours. |
You can just reset you branch to my and push it. Also you can replace commit author - still, this is your code, I just organized it. |
…s to the `GrammarError`
…d to it like the same in the GrammarError
- `codeLocation`: for spans of the user code - `nameLocation`: for rule names (useful for check duplicate rules pass) - `labelLocation`: for label names (useful for check duplicate labels pass)
OK, I finish in my fork. You can just reset you branch to my, push it and then merge that PR. Although I want that someone native speaker review my comments in the I have a strong feeling that the |
I finally got around to the merge, @Mingun. I think this is ready to merge. |
Yes, it is ready. Do it. |
Change-Id: I1e164456794980c0e2d95de8c8b8e8ec3ea7021e
Fixes #102
I'd like a couple of people to review this please. I'll put some notes inline.