-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate SourceNodes for bytecode #240
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You change the source code generator. You should regenerate parser.js
after that (please do that by separate commit). Just run
npm run parser
That also allows you to verify that you do not broke pretty printing.
Also what about tests? We should be sure that this won't broke in the future, so you need to add tests, similarly to existing ones
lib/compiler/passes/generate-js.js
Outdated
function helper(code) { | ||
if (Array.isArray(code)) { | ||
return code.map(helper); | ||
} | ||
if (code instanceof SourceNode) { | ||
inSourceNode++; | ||
code.children = helper(code.children); | ||
inSourceNode--; | ||
return code; | ||
} | ||
if (sawEol) { | ||
code = code.replace(/^(.+)$/gm, " $1"); | ||
} else { | ||
code = code.replace(/\n(\s*\S)/g, "\n $1"); | ||
} | ||
sawEol = !inSourceNode || code.endsWith("\n"); | ||
return code; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code not so clear. It is better to explain it with a few lines of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I actually made sure that the generated parser didn't change. That's what some of the acrobatics in indent2 are about. This only changes the generated source map, which currently isn't checked in (or generated by default). |
I posted an update which I hope addresses most of your concerns. For the tests - can you give me some pointers wrt what a test for this would look like? |
test/api/pegjs-api.spec.js
Outdated
it("literal expression", () => check("'a'", source, null, "input.charCodeAt(peg$currPos) === 97")); | ||
it("chars expression", () => check("[abc]", source, null, "peg$r0.test(input.charAt(peg$currPos))")); | ||
it("rule expression", () => check("RULE_2", source, null, "= peg$parseRULE_2();")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean that tests.
Mapping for "rule expression"
is not nice, though. Will it possible to reference just to the peg$parseRULE_2
in call?
Also need to check "literal expression"
with several symbols, because generated code for that case is different from one-char literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I posted this yesterday, but I can't find my comment so...
peg$parseRULE_2
would actually work right now, but relies on the generated code for the use of the rule coming after the generated code for the definition. We don't need the =
, but I think the ();
should be left to differentiate it from the definition, peg$parseRULE_2() {
, which you used above. It might even be a good idea to add a check that the generated chunk only occurs once.
Also, I ended up using a complicated regex to test a choice expression - not sure if that falls into the "not nice" category, but I couldn't see a better way of doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mingun is this resolved to your satisfaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, you can rebase code for now to create a clear history without fixup commits (I always would prefer such history, it is easy for understanding, in my opinion).
But I see that in many cases mapping is bad -- you can check it on the https://sokra.github.io/source-map-visualization/#custom. Just try to generate mapping for the grammar.peggy
using
node bin/peggy.js -o lib/parser.js --format commonjs -m lib/parser.js.map src/parser.pegjs",
(For example, I locally change the "parser" rule in package.json
)
Label names are mapped to random locations. I think, it can be mapped to the location of a variable where corresponding assignment is done or to the location where them passed to the action / final result:
Initializer
= code:CodeBlock EOS { ... }
function peg$parseInitializer() {
var s0, s1, s2;
s0 = peg$currPos;
s1 = peg$parseCodeBlock();
^^ ~~~~~~~~~~~~~~~~~~~~
// \- map `code` here \- map `CodeBlock` here
if (s1 !== peg$FAILED) {
s2 = peg$parseEOS();
if (s2 !== peg$FAILED) {
peg$savedPos = s0;
s0 = peg$f2(s1);
// ^^
// \ or map `code` here
} else {
peg$currPos = s0;
s0 = peg$FAILED;
}
} else {
peg$currPos = s0;
s0 = peg$FAILED;
}
return s0;
}
The ?
, *
and +
operators could be mapped to if (...!== peg$FAILED)
/ while (... !== peg$FAILED)
, or just to corresponding expressions in that if
/ while
.
The $
should be mapped to the input.substring(..., peg$currPos)
part.
The @
probably should be mapped to the RHS part of the assignment expression in
sX = [sY, sZ, ...]; // when two or more @ symbols
// ^^ ^^
// here
// or
sX = sY; // when only one @ symbols
// ^^
// here
I think, that literals should be mapped only to compare expression:
"literal"
"a"
should be mapped to:
if (input.substr(peg$currPos, 2) === peg$c21) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - "literal"
if (input.charCodeAt(peg$currPos) === 93) {
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - "a"
, I think
It is also totally fine if you'll completely ignore mapping for operators and implement it only for rule functions calling (seems to be what you most expect from this PR)
Sorry, do you mean flatten into a single commit, or just merge the fixes into their corresponding original commits? If the latter, then presumably, there would be 3 commits: the code to generate the source map, the api.spec tests, and the generate-bytecode.spec tests. |
I have been looking at the Peggy grammar with https://sokra.github.io/source-map-visualization/#custom , as well as stepping through it with the VSCode debugger, and for the most part felt that it was ok. Most of the changes I wanted involved changing the parser itself, and I felt it was better to leave that until after this pull request was done. But I'll see what I can do with the issues you mentioned. |
1fd40f1
to
911b4ee
Compare
I cleaned up the commits, and addressed everything except the update to peg.d.ts, and the accuracy of the source mapping. Let me know what to do about peg.d.ts, and I'll continue to look at exactly what gets mapped where... |
Yes, that I was mean (actually, tests could be squashed into single commit. For example, I usually do one commit with implementation, one for tests and one for update What about mapping, I think, we need to create a table:
which includes every peggy construction and piece of corresponding generated code, where we can mark to which it should be mapped (or just say, that mapped to the whole snippet).
After thinking, I think, I was wrong, we need to map |
Right. But I think the same argument goes for everything, leaf or not. I'm sorry, I've not had time to really look into this for a couple of days, but will hopefully get back to it soon. The one thing that has occurred to me is that I think where the mapping ends in the source is currently broken. Suppose we have generated code "(a (b) a)" that is supposed to map to "(A (B) A)" where parentheses represent SourceNodes. ie we have an outer SourceNode mapping (a b a) to (A B A) and an inner one mapping (b) to (B). The problem is that there is no "end of source" marker. So what actually happens is that we end up with a map that looks like (a (b) a) => (A (B A)), with the second "a" mapping to the empty string at the end of the second "A". I made an attempt to put end-of-source markers in, but didn't really think it through; all it does is insert an empty node saying "the end of the code generated for b maps to the source corresponding to the end of B" - but that doesn't really change anything because it will still consider everything up until the next source node as belonging to B; we need something telling it that after B, it pops back out to A. I think that should be straightforward, and hopefully, the results will look much better. |
911b4ee
to
6600ff8
Compare
The changes I pushed include a commit to add type info for the various bytecode arrays. I've put this at the bottom of the stack, since I think it should remain separate from the rest of the changes. I've added a new commit which seems to fix the end-of-source issue I mentioned above. The idea is that when you pop an inner scope, you now also pop, and then repush the containing scope, but push it with the source location of the end of the inner scope. After this, the mapping seems much more consistent. I also now see some of the issues you mentioned. The worst one is that labels don't get mapped at all. The reason is that a A possible fix for that would be to change the contained node's location so it does include the label. That would make it look a little better in the source map viewer, but wouldn't really change the debugging experience. A better option might be to map the actual uses of the generated variables corresponding to the label back to the label - especially if we could also let the debugger know the correct name for the generated variable. I made a stab at that (not included in the pull request), and while it sort-of works (I can see that the arguments to the peg$fN functions are mapped to the labels), I don't see any change to the debug experience. I've pushed the change to https://github.com/markw65/peggy/tree/label-experiment, in case you want to take a look (it breaks the byte code tests as it stands, because I didn't get around to updating the tests; but otherwise it should be fine). Yet-another option would be to ensure that the assignment to sN corresponding to the label gets mapped back to the label. Since the inner expression could be of any type, its not clear to me that there's a straightforward way to make that happen - but I don't really know the code well enough yet... |
@markw65 I apologize that this got closed by mistake in the runup to the 2.0.0 release. We've changed our process so that shouldn't happen again. |
Yes, sorry, I've been working on other things. But I would like to get this to a point where its acceptable. I'll do the rebase now. |
6600ff8
to
febd2e9
Compare
@markw65 is this ready for review? |
Well, I think this produces source maps that are enough of an improvement to be worth doing as is, but @Mingun has pointed out various places where the results could be better - and I agree it could be improved on. I've made a stab at a couple of those, but so far haven't really got anywhere. So - I would like to make some improvements, but its more up to you guys whether that has to be done before this can be landed, or whether it could happen as a follow up. |
Everything could always be better. :) I'll review this tomorrow and see where we're at. I'll look at Mingun's comments to see if any of them feel like blockers, or things we can refine once this lands. |
See the current mapping for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only changes I need are the CHANGELOG.md and AUTHORS. We'll give @Mingun a day to see if there's anything else he wants changed before this drops.
FANTASTIC work. Thank you.
// Source Mapping | ||
// -------------- | ||
// | ||
// [37] SOURCE_MAP_PUSH n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to notify downstream folks that have their own generators that there are new opcodes. At the very least, this needs to be a major change in the changelog. @markw65 can you add an entry to CHANGELOG.md, and add yourself to AUTHORS while you're at it, please?
test/api/pegjs-api.spec.js
Outdated
@@ -239,8 +241,9 @@ describe("Peggy API", () => { | |||
const SOURCE = ` | |||
{{${GLOBAL_INITIALIZER}}} | |||
{${PER_PARSE_INITIALIZER}} | |||
RULE_1 = !{${NOT_BLOCK}} 'a' rule:RULE_2 {${ACTION_BLOCK}}; | |||
RULE_2 'named' = &{${AND_BLOCK}} @'b'; | |||
RULE_1 = !{${NOT_BLOCK}} 'a' rule:RULE_2 {${ACTION_BLOCK}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the semi-colon change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually remember. I guess it works either way. I'll put them back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not high priority, i was just wondering.
test/api/pegjs-api.spec.js
Outdated
it("literal expression", () => check("'a'", source, null, "input.charCodeAt(peg$currPos) === 97")); | ||
it("chars expression", () => check("[abc]", source, null, "peg$r0.test(input.charAt(peg$currPos))")); | ||
it("rule expression", () => check("RULE_2", source, null, "= peg$parseRULE_2();")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mingun is this resolved to your satisfaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I see a couple of things that definitely needs a more precise mapping, but because I not use a source-map feature by myself, and if actual users is think that it is good enough for debugging, I will not obstruct to merge it now as-is and make additional PRs in the future. In the end, partially working feature is better than missing feature.
I think, that in any case we need to build a table that I mention before, at least for documentation purposes.
For now I see the following bad mapings in the example:
:
in labels mapped with corresponding label name. Only label name should be mapped:top = c:count* { return c.filter(fb => fb) } // ^^ - this is mapped as a whole thing
- only call function expression should be mapped to a rule reference:
top = c:count* { return c.filter(fb => fb) } // ^^^^^ - this is mapped also to a `s1 = [];`, for example
- mapping to literals mistakenly can capture the part after the literal (point 1):
count = end_comment "\n" { return } // ^^^~~ - 1) that thing is mapped as a whole. The ~~ part is superfluous / comment? fb:line (comment / end_comment)? "\n" { // ^^^^^ - 2) this is mapped as a whole // ~^~ - 3) the ~ parts are superfluous
- sometimes quantificators merged with labels (see above, point 2). This should be fixed before merge, I think
- sometimes spaces around operators also included in mapping (see above, point 3)
Creating a replace table should help with tricky situations, so I strongly recommend to try to build it to see the underwater rocks that otherwise could be not obvious
Actually, source map is a mapping from generated code to original code, so the table should be
Generated | Original |
---|
Then try to imagine, to which locations you would like to map each piece of the generated code and fill the table.
In the end we could get something like that (for demonstration purposes only, I do not insist that mapping should be that way in every detail):
function peg$parsetop() {
// top = c:count* { return c.filter(fb => fb) }
// ^^^^^^^^^^^^---------------^^^ ^ ^^^^^~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
var s0, s1, s2; // | \|/ | \|/
s0 = peg$currPos; // | | | |
// ^^^^^^^^^^^^^^^^-----------------------------+ |
s1 = []; // | | | |
// ^^--------------------------------------+ |
// ^^------------------------------------+ | | |
s2 = peg$parsecount(); // | | | |
// ^^^^^^^^^^^^^^-----------------------+ | |
while (s2 !== peg$FAILED) { // | | | |
// ^^^^^^^^^^^^^^^^^^^^^^^^^--------------------+ |
s1.push(s2); // | | | |
// ^^----------------------------------+ | | |
s2 = peg$parsecount(); // | | | |
// ^^^^^^^^^^^^^^---------------------' | |
} // | | |
peg$savedPos = s0; // | | |
// ^^^^^^^^^^^^^^^^^----------------------------' |
s1 = peg$f0(s1); // | |
// ^^------------------------' |
// ^^^^^^-------------------------------------------------'
s0 = s1;
return s0;
}
I think the "? fb" case is the only one that is urgent. Where do we think this table would go? It's very specific to JS output as you've explained it. |
@hildjj I don't mind either way. I'm a bit swamped with other things right now, but should be able to finish this off within a few days. Not worried about "full credit" in any way... |
@markw65 :) Let's wait a few days for you to get to it then. Please say something if you want help or other opinions. |
I just had a thought. We're not using |
febd2e9
to
464f138
Compare
Hmm. I just rebased, and after resolving one minor merge conflict, the tests seem to pass. I've updated the branch (but no changes yet, other than rebasing).
Can you explain that? Sorry, it's not really clear what you're referring to.
I'm going to have to page this all back in, but I think the answer is no. The problem is that we call labeled, which doesn't actually generate any code for itself; it just modifies the context, and generates its sub-expression. So although it generates a SourceNode with the correct source range, the generated-range is identical to the SourceNode that its expression produces. So the outer SourceNode has nothing to attach to, and is essentially discarded. I think if I could persuade the sub-expression to wrap everything except the final assignment in its sourcenode, that assignment would end up mapped to the label. But thats where I got stuck last time I was working on it. Im not sure if its actually hard, or if I just don't understand the inner workings well enough. |
That's a reasonable request, because it was completely without context. :) If you go to the fizzbuzz example, and look for the original line |
One other question, when I built, I found these changes. But if I go back to main and build, I get a similar set of changes (but not the same). Am I supposed to include those in the commit? Did someone forget last time? |
Those files are all generated. Check in whatever you have at the end of your process, and don't worry about diffs or whether we merged perfectly beforehand. |
So the issue with that label is that source maps aren't really hierarchical structures. They simply map a source location to a generated location, and its assumed that the mapping continues until something else happens. In this case, there is no generated code associated with the label (as explained above). So the next thing that happens after the ? is the "line". So thats what we get. I mentioned a while back that I'd tried an experiment to map the label to the argument of the call that consumes it. This is what you get from that. I don't think thats the right thing to map the label to. But it certainly solves the appearance. I think what we really want is to map the label to the definition of the variable that eventually gets passed to the call. I have an idea for that, and I'll see if I can make that work tomorrow. |
464f138
to
fc4b42e
Compare
I've made some changes to map the label to the assignment(s) to the variable that will hold the label. Here's the results for fizzbuzz This ended up being more invasive than I had hoped. Let me know if you think this is a reasonable approach, or if you see a better way of doing it. |
Also, meant to add, I didn't write tests yet for the new bytecodes, because I wanted to make sure you like the approach (and are happy with the results) first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mingun I think this is close enough to merge now. I think we should take your idea of documenting the mapping and think about how it ties in with code generation in multiple languages.
@@ -13,6 +13,8 @@ Released: TBD | |||
- [#280](https://github.com/peggyjs/peggy/issues/280) Add inline examples to | |||
the documentation, from @hildjj | |||
|
|||
- [#240](https://github.com/peggyjs/peggy/issues/240) Generate SourceNodes for bytecode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That needs to go up into a new section in the changelog, but I'll fix this up myself when I take #300.
@hildjj There is only one way to check if source map is correct: if you're stepping though the program in devtools, and it steps properly, then the mapping is correct. Unfortunately, it will never be possible to do for every language and debugger at hand. |
Sorry, not sure of procedure here... is anything waiting on me? |
I was waiting on Mingun's opinion. I'll be on the road all day today, but if I haven't heard anything by tonight, I'll probably go ahead and merge it. |
Add SourceNodes for all code generated via bytecode. This means that you can step through rules and expressions, in addition to actions and predicates.
fc4b42e
to
13e8c8f
Compare
Sorry, I forgot that I hadn't added tests, or flattened the stack. I just did that. |
Unfortunately, I have no time to review this, so please go ahead when it will suitable for you. |
We'll deal with edge cases going forward. I'd like the granularity while I'm playing with peggy-test. |
Add SourceNodes for all code generated via bytecode. This means
that you can step through rules and expressions, in addition to
actions and predicates.