Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document attributes #11

Open
fabianfreyer opened this issue Apr 30, 2019 · 3 comments
Open

document attributes #11

fabianfreyer opened this issue Apr 30, 2019 · 3 comments

Comments

@fabianfreyer
Copy link

I think I understand what {ws=implicit} and {ws=explicit} does, but what do the following attributes mean?

  • pin
  • recoverUntil
  • fragment
  • simplifyWhenOneChildren

I'd be happy to open a PR to document these somewhere as soon as I understand what they do.

@menduz
Copy link
Member

menduz commented May 1, 2019

Hello,

So there are Constructions or Rules defined as ConstructionName := Term Term Term {pin=1). In those constructions, terms have a number, starting at 0.

  • pin=1 means that the parser will never backtrack across the defined term. It is the number of the term in the construction. If the next rule doesn't match a parse error will be triggered immediately or it will try to recover.
  • recoverUntil after "pinning", if the input is invalid the parser will try to read characters until it can recover from the error. recoverUntil is the terminal construction to start recovering.
  • fragment means the parser will no treat the construction as a term in the parent construction, instead it is a fragment of a bigger construction and all the children will be injected as part of the construction parent, not of the current construction.

Example:

FunctionDecl := FunKeyword Name Parameters
FunKeyword := 'function'
Name := [A-Z][a-z]*
Parameters := Parameter+
Parameter := Name ' ' {fragment=true}

is the same as

FunctionDecl := FunKeyword Name Parameters
FunKeyword := 'function'
Name := [A-Z][a-z]*
Parameters := (Name ' ')+
  • simplifyWhenOneChildren simplifies the constructions when it only matches one of the terms, it returns the matched term
    Example:

    Expression := MulExpression
    MulExpression := AddExpression (('*' | '/') AddExpression)* {simplifyWhenOneChildren=true}
    AddExpression := NominalExpression (('+' | '-') NominalExpression)* {simplifyWhenOneChildren=true}
    NominalExpression := Number | VariableName {fragment=true}

    So, here are some parsing examples

    1
    ^ Number
    
    1 + 4
    ^     Number
        ^ Number
    ^^^^^ AddExpression
    
    1 + Abc * 5 + 1
        ^^^          VariableName
              ^      Number
    ^                Number
        ^^^^^^^      MulExpression
                  ^  Number
    ^^^^^^^^^^^^^^^  AddExpression
    

    See how a lot of nodes are simplified because they only have one children?

I hope I made myself clear, thanks for your interest in this library :)

@fabianfreyer
Copy link
Author

Thank you very much for your explanations! I'm not yet sure how to understand the pin and recoverUntil attributes though. Do you have an example, where this could be used, similar to the one you showed with simplifyWhenOneChildren?

Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following:

Statement ::= Expression ';' {ws=implicit,simplifyWhenOneChildren=true}
Expression ::= ...

@menduz
Copy link
Member

menduz commented May 1, 2019

This example parses a JSON file with error recovery and pinning:

{ ws=implicit }
/* JSON WITH ERROR RECOVERY https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | number | string | array
BEGIN_ARRAY          ::= #x5B /* [ left square bracket */
BEGIN_OBJECT         ::= #x7B /* { left curly bracket */
END_ARRAY            ::= #x5D /* ] right square bracket */
END_OBJECT           ::= #x7D /* } right curly bracket */
NAME_SEPARATOR       ::= #x3A /* : colon */
VALUE_SEPARATOR      ::= #x2C /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT object_content? END_OBJECT { pin=1 }
object_content       ::= (member (object_n)*) { recoverUntil=OBJECT_RECOVERY }
object_n             ::= VALUE_SEPARATOR member { recoverUntil=OBJECT_RECOVERY,fragment=true, pin=1 }
Key                  ::= &'"' string { recoverUntil=VALUE_SEPARATOR, pin=1 }
OBJECT_RECOVERY      ::= END_OBJECT | VALUE_SEPARATOR
ARRAY_RECOVERY       ::= END_ARRAY | VALUE_SEPARATOR
MEMBER_RECOVERY      ::= '"' | NAME_SEPARATOR | OBJECT_RECOVERY | VALUE_SEPARATOR
member               ::= Key NAME_SEPARATOR value { recoverUntil=MEMBER_RECOVERY, pin=1 }
array                ::= BEGIN_ARRAY array_content? END_ARRAY { pin=1 }
array_content        ::= array_value (VALUE_SEPARATOR array_value)* { recoverUntil=ARRAY_RECOVERY,fragment=true }
array_value          ::= value { recoverUntil=ARRAY_RECOVERY, fragment=true }

number               ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))? { pin=2, ws=explicit }

/* STRINGS */

string                ::= ~'"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"' { ws=explicit }
HEXDIG                ::= [a-fA-F0-9] { ws=explicit }

This one is the same but without error recovery:


/* https://www.ietf.org/rfc/rfc4627.txt */
value                ::= false | null | true | object | array | number | string
BEGIN_ARRAY          ::= WS* #x5B WS*  /* [ left square bracket */
BEGIN_OBJECT         ::= WS* #x7B WS*  /* { left curly bracket */
END_ARRAY            ::= WS* #x5D WS*  /* ] right square bracket */
END_OBJECT           ::= WS* #x7D WS*  /* } right curly bracket */
NAME_SEPARATOR       ::= WS* #x3A WS*  /* : colon */
VALUE_SEPARATOR      ::= WS* #x2C WS*  /* , comma */
WS                   ::= [#x20#x09#x0A#x0D]+   /* Space | Tab | \n | \r */
false                ::= "false"
null                 ::= "null"
true                 ::= "true"
object               ::= BEGIN_OBJECT (member (VALUE_SEPARATOR member)*)? END_OBJECT
member               ::= string NAME_SEPARATOR value
array                ::= BEGIN_ARRAY (value (VALUE_SEPARATOR value)*)? END_ARRAY

number                ::= "-"? ("0" | [1-9] [0-9]*) ("." [0-9]+)? (("e" | "E") ( "-" | "+" )? ("0" | [1-9] [0-9]*))?

/* STRINGS */

string                ::= '"' (([#x20-#x21] | [#x23-#x5B] | [#x5D-#xFFFF]) | #x5C (#x22 | #x5C | #x2F | #x62 | #x66 | #x6E | #x72 | #x74 | #x75 HEXDIG HEXDIG HEXDIG HEXDIG))* '"'
HEXDIG                ::= [a-fA-F0-9]

You can test those examples online here: https://menduz.com/ebnf-highlighter/

The pin property is used several times here because the grammar is hard enough to support it, i.e. only objects start with {, so if we need to read a value and we detect a { we can say everything isn't going to be anything else but an object after it. The parser will not backtrack that pin in case of failure.


Also, are there any other incompatibilities between attributes? I'm having trouble adding ws=implicit and simplifyWhenOneChildren=true, in a case like the following

It is possible, simplifyWhenOneChildren would produce inconsistent results with implicit WS children, as a rule of thumb: avoid ws=implicit when possible, it makes things slower and makes more difficult the grammar creation process.

If you want to take a look to a real world grammar built with this package you can refer to Lys Grammar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants