-
Notifications
You must be signed in to change notification settings - Fork 66
Semantic Actions
Semantic actions are introduced with the {ActionName}
syntax, just behind the expression you want to give the action to. When the expression returns from parsing, ActionName
is called on the expression parse tree. An action can be a named callable or a function/delegate literal (anonymous function), as long as it accepts a ParseTree
as argument and returns a ParseTree
.
Actions:
Rule1 <- Expr1 Expr2 {ActionA} Expr3 {ActionB)
Rule2 <- (Expr1 / Expr2) {ActionA} Expr3
Rule3 <{ActionB} Expr2 / Expr3 (!Expr1 Expr2)*
Rule4 <- (Expr1 {ActionA} Expr2 {ActionB}) {ActionC}
The previous example demonstrates some ways an action can be declared:
-
for
Rule1
,ActionA
is called onceExpr2
has finished parsing and thenActionB
forExpr3
. -
for
Rule2
,ActionA
is called once the 'or' expression returns, be it with anExpr1
or anExpr2
.Expr3
has no associated action. -
Rule3
is an example of rule-level action: once the right-hand side expression returns,ActionB
is called on its output. -
Rule4
is an example of nested actions:ActionA
is called afterExpr1
,ActionB
afterExpr2
and thanActionC
on the global sequence output.
Actions can be declared as anonymous functions, examples of which include:
Actions:
Rule1 <- Expr1 { function ParseTree(ParseTree p) {return p;} } # Verbose form
Rule2 <- Expr2 { delegate ParseTree(ParseTree p) {return p;} } # Verbose form
Rule3 <- Expr3 { (p) {return p;} } # Parameter and return-type inferred
Rule4 <- Expr4 { p => p.children[0] } # Lambda syntax
These actions can be freely mixed with normal callable actions, e.g.:
Actions:
Rule1 <- Expr1 { myAction1, p => p.children[0], myAction2 } # Mixed callable/lambda action list
When using anonymous functions as actions inside ordered choice rules, it may be necessary to explicitly declare the parameter type to avoid compilation errors involving delegates and nested templates, e.g.:
Actions:
Rule1 <-
/ Expr1 { (ParseTree a) => doStuff(a) } # instead of { (a) => doStuff(a) }
/ Expr2 { (ParseTree a) => doOtherStuff(a) } # ditto
Note: avoid using delimited strings with identifier delimiters (http://dlang.org/lex.html#DelimitedString) inside anonymous function actions, as they may not be correctly parsed (in order to keep the Pegged grammar relatively simple). However delimited strings using nested delimiters ({} [] <> ()
) are OK.
What are actions good for? They offer the user an opportunity to act on the parse tree during the parsing process. Since they are passed the complete output of an expression, they can modify the output, or construct some value based on the passed argument. Let's demonstrate the two uses:
PT cutChildren(PT)(PT p)
{
p.children = null;
return p;
}
cutChildren
is a parse-tree-pruning action: it just nullifies the children
array in the parse tree. Put it after expressions where you don't care for children expressions and just want to keep the captures.
Note that cutChildren
is a function template. It's because it must be able to accept any kind of parse tree. (TODO: more explanations. For now, just make your semantic actions templates and everything will be alright).
Now, this action keeps only the first match of a rule:
PT first(PT)(PT p)
{
if (p.matches.length > 1)
p.matches.length = 1;
return p;
}
Now, let's use actions to validate some XML nodes. First, a grammar:
mixin(grammar(`
XML:
Node <- OpeningTag (Node / Text)* ClosingTag
OpeningTag <- "<" identifier ">"
Closingtag <- "</" identifier ">"
Text <~ (!OpeningTag !ClosingTag .)* # Any char, as long as it's not a tag
`));
Suppose we want to validate the tags: any opening tag must be closed by an equivalent closing tag. We will use actions to push the tag identifier on a stack, which will be popped by closing tags.
import std.array;
string[] nameStack;
PT opening(PT)(PT p)
{
nameStack ~= p.matches[0];
return p;
}
PT closing(PT)(PT p)
{
if (nameStack.back != p.matches[0])
p.successful = false; // Make the rule fail
else
nameStack.popBack;
return p;
}
And, adding actions to the XML grammar:
mixin(grammar(`
XML:
Node <- OpeningTag{opening} (Node / Text)* ClosingTag{closing}
OpeningTag <- "<" identifier ">"
Closingtag <- "</" identifier ">"
Text <~ (!OpeningTag !ClosingTag .)* # Any char, as long as it's not a tag
`));
Now, let's test it:
assert( XML("<a> Hello <b> World </b> ! </a>").successful);
assert(!XML("<a> Hello <b> World </c> ! </a>").successful); // <b> closed by a </c>
assert(!XML("<a> Hello <b> World </a> ! </b>").successful); // <a> and <b> incorrectly nested
As you can see, correctly nested nodes get parsed, but not incorrectly closed and nested nodes. This means actions do validation while parsing and, if the parsing is successful, you can be sure the input is a correctly nested collection of nodes and that the parse tree is also correct for any following function to act upon.
There is no real difference between
Rule1 <- Expr1 {Action}
and
Rule1 <{Action} Expr1
But there is a difference between
Rule1 <- A {ActionA} B {ActionB}
Rule2 <- B A
A <- ...
B <- ...
and
Rule1 <- A B
Rule2 <- B A
A <{ActionA} ...
B <{ActionB} ...
The latter means that ActionA
gets called every time A
is used, in Rule1
as well as in Rule2
. Whereas for the former grammar, the actions are activated only for Rule1
. Just decide what you want for your grammar.
Actions are called when a rule finishes parsing, successfully or not. Note that sequences stop once a rule fail. Like this:
Rule1 <- A {foo} B {bar} C {baz}
If A
succeeds, foo
is called on its result. If B
then fails, bar
is still called on its resulting parse tree. Unless bar
changes B.successful
to true
(a legal, but dangerous move), B {bar}
will fail and C {baz}
is not tested by Pegged.
A previous version of Pegged called actions only on successful parses. This forbade 'recovery rules', that'd accept a failed parse attempt and correct it to make it OK. So I'm testing letting rules be called after any parse result.
A consequence of the previous section is that side-effects (like storing a value in a variable) can be activated inside a failed rule. In the previous example, foo
was called. If foo
acts like opening
, then something somewhere was changed. Even though Rule1
failed, some partial modification was made. Maybe a cleaner solution would be to test each rule in a sandbox and undo any changes that were made by a failed rule. For actions acting on a grammar inner fields (something I plan to add), it's feasible: the current state can be duplicated, passed to a rule as a context variable and returned with the parse tree. If the rule fails, then the returned context is discarded. When the rule succeeds, the caller context is assigned the returned context.
Of course, all this copying and passing around'll probably slow the parsing process down and I cannot protect against any global variable modification.
You can have as many actions as you wish between the curly braces (min. 1, {}
is not a valid action). Separate the actions names by a comma:
Rule1 <- A {foo, bar, baz} B {foo, baz}
Once A
returns, foo
is called on its result, then bar
on foo
's result and then baz
. For B
, it'll be foo
and then baz
.
For now, Pegged has no predefined actions. Here is the minimum list I want to add:
- discardChildren (nullifies the
children
field of a parse tree) - discardMatches (nullifies the
matches
field of a parse tree) - fuseMatches (concatenates a parse tree's matches together)
- failure (parsetree.successful = false)
- success (parsetree.successful = true)
- discard (discards anything the parse tree achieved: reset
.end
to be equal to.begin
, discards the matches and the children)
I'm playing with the following ideas concerning actions:
-
Permitting other arguments, like this:
{dropChild(1)}
, which would calldropChild(ruleOutput, 1)
. Heck, in DMD with UFCS, it's justruleOutput.dropChild(1)
which is quite easy to code. -
For now, actions are what I'd call internal actions: they act on the parse tree and any external action is a side-effect (assigning to external variables, for example). I could introduce 'external actions', for example with
<ActionName>
. These would return any D type and plug into one another during parsing. This would allow Pegged to have the same run-of-the-mill example of calculus on arithmetic expressions as other parser generators. We'll see...
Next lesson Generating Code