-
Notifications
You must be signed in to change notification settings - Fork 38
Syntax Guide
This is a guide to the basic syntax of Pegasus. For more advanced topics, see the "How Do I... ?" article.
A Pegasus grammar consists of a text file with two sections, in order:
- The "Settings" section.
- The "Rules" section.
Settings are specified in one of three ways:
-
@setting value
For simple values, just write the setting value out. This is parsed as a type name. -
@setting { value }
For more complex values, wrap the setting value in curly braces. This is parsed as a code section. -
@setting "value"
An alternative to using curly braces is to use a string.
-
@namespace
Specifies the namespace in which the parser class will be placed. -
@accessibility
Specifies the accessibility of the generated class. -
@classname
Specifies the name of the generated class. -
@ignorecase
Specifies the default behavior of the parser with regards to case sensitivity. -
@resources
Specifies the resources class to be used for resource based strings. -
@start
Specifies the starting rule. Defaults to the first rule in the grammar. -
@trace
Enables or disables tracing. Defaults to false. -
@using
Adds a using directive to the generated class file. (Multiple Allowed) -
@members
Allows for the definition of additional class members.
@namespace PegExamples.Foo
@accessibility internal
@classname MyParser
@ignorecase true
@resources MyProject.Properties.Resources
@start startingRule
@trace true
@using System.Linq
@using { Foo = System.String }
@members
{
private static bool HelperFunction()
{
}
}
The basic syntax of a rule is:
name = expression
By default, rules infer their return type. For sequence expressions this is string
, but this can be modified by specifying a type for the rule, like so:
name <type> = expression { ... }
Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:
rule -flag = expression
rule <type> -flag = expression
-
-memoize
Enables memoization for the rule. -
-lexical
Specifies that the rule should be included in thelexicalElements
collection whenever it is successfully parsed. -
-export
Specifies that this rule will be included in this grammar's exported rules. Use this to make the rule available to other parsers in a convenient format. This is primarily used for#parse{}
expressions. -
-public
Specifies that a public entry point will be made for this rule. Use this if it makes sense to parse an entire string using this rule. This could be used to provide user-input validation for primitive values supported by your parser.
- String
'foo'
or"bar"
: String expressions match a string literally. - Character Class
[a-z]
or[a-z.,0-9]
or[\x1f-\xfe\u0100-\u1fff]
: Matches a single character that is within the character class. - Negative Character Class
[^a-z]
or[^a-z.,0-9]
or[^\x1f-\xfe\u0100-\u1fff]
: Matches a single character that is not within the character class. - Wildcard
.
: Wildcard expressions match any single character.
Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i
. For example, "foo"i 'bar'i [baz]i
Or, they can be marked as case-sensitive by suffixing the string or class with the letter s
.
Strings can be read from resources by suffixing the string with the letter r
. The string to be parsed is then read from the grammar's resources, specified via the @resources
setting described above.
- Name
a
: Name expressions refer to a rule by name. - Labeled
foo:a
: Labeled expressions store a parse result for use in code assertions and expressions. - Sequence
a b c
: Sequence expressions match each component consecutively. - Choice
a / b / c
: Choice expressions provide options for parsing. They are evaluated consecutively. - Assertions
!a &b
: Assertion expressions act as look-aheads. They peek at the parsing subject, and do not logically advance the cursor (although internally they do use a cursor). - Code Assertions
!{foo} &{bar}
: Code assertions are similar to regular assertions, except they represent C# code that returns a Boolean value, rather than performing a look-ahead. - Repetition
a? b+ c* d<3> e<2,> f<1,5>
: Repetition expressions allow another expression to be repeated.-
expr<3>
matches an expression exactly three times. -
expr<2,>
matches an expression two or more times. Greedy. -
expr<1,5>
matches an expression one to five times. Greedy. -
expr?
matches an expression one or zero times. Equivalent toexpr<0,1>
. -
expr+
matches an expression one or more times. Equivalent toexpr<1,>
. -
expr*
matches an expression zero or more times. Equivalent toexpr<0,>
.
-
- Delimited Repetition
a<0,,",">
: Repetition expressions also support a delimiter that will match (and consume) in between each repeated match. - Parenthesis
( ... )
: Parenthesis are used to group expressions. - Type
(<type> ... )
: Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.
- Code
{ code }
: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence. - Error
#error{ code }
: Error-type code expressions throw aSystem.FormatException
with the error message specified by the code section. The exception that is thrown will also have theData["cursor"]
property set, so that the location of the error can be determined. - State
#{ code; }
: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify thestate
object in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition. - Parse
#parse{ code }
: Parse-type code expressions not only allow mutation of the cursor like state expressions, but also return aParseResult<T>
, allowing the integration of more complex parsing logic. The canonical example of this would be using an exported rule from another Pegasus parser.
-
/* ... */
Multi-line comment -
// ...
Single-line comment