Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE YET] Added generic parser extensions #58

Draft
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

Unknown6656
Copy link

@Unknown6656 Unknown6656 commented Jun 6, 2020

Disclaimer: This is a pull request which is not yet ready for merging due to the following reasons:

  • Not enough testing
  • Changes (and new code) has not been documented

I will notify you (@Dervall) when this branch is ready for merging (this could take a few months ... depending on how much time I have).


I changed a couple of things on my fork during the past 2..3 years:

  • Made heavy use of C#7 and C#8 features, such as ValueTuples and nullable reference types. This greatly reduced the possibility of NullReferenceExceptions to occur.
  • Improved tracing/debugging of parser construction errors, e.g. it would print in greater detail the reason for certain grammar rules to fail
  • Added/improved the naming of non-terminal symbols for debugging reason
  • Added a generic extension to the parser (see https://github.com/Unknown6656/Piglet/blob/master/Piglet/Parser/Configuration/Generic/ParsingUitilities.cs)
  • Moved to .NET5 (that point is not so important)

EDIT: ToDo-List of all covered and open points:

  • document code (XML docs)
  • document code changes and create documented code samples (C# and F#)
  • move from T4-templates to a code gen project
  • fix all nullability annotations
  • incorporate C#9 features (mainly pattern matching and generic nullability annotations). The current piglet code base could greatly profit from these features
  • Analyse perser performance. Maybe switch to Span<T> and ref-features to improve the parser.

@Unknown6656
Copy link
Author

.... And by the way: Thank you SOOOOOO much for creating this wonderful project!
I have used it already in half a dozen projects (all of them compilers or interpreters).
This library is wonderful and IMHO way easier to use than lexer/parser generators.

@Unknown6656
Copy link
Author

Unknown6656 commented Jun 7, 2020

Note to myself: I should move away from T4 Templates, as they are not supported on non-Windows OS

EDIT: Done.

@Unknown6656 Unknown6656 marked this pull request as draft June 7, 2020 19:02
Copy link
Author

@Unknown6656 Unknown6656 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dervall
Copy link
Owner

Dervall commented Jun 8, 2020

Thank you for your contributions! I haven't been exactly active in developing this library, and I'm super happy that you like and and that you are using it.

Could you write a little text about what sort of changes you are proposing? I'm not exactly sure what a generic extension is to be honest :)

Also, do you remember what sort of bugs you found and fixed?

I've taken a quick look now, and will look through it in more detail when you feel that you are ready with your changed. Thanks again!

@Unknown6656
Copy link
Author

Unknown6656 commented Jun 8, 2020

@Dervall I have to admit, that I do not quite remember the bugs of the 2017/2018-commits (though I mainly recall NullReferenceExceptions and improvements for grammar debugging), however I can give you a small example of my generic extension (I should find a fancier description for that feature):


Imagine having the following grammar:

rectangle := "(" point "," size ")"                                 // (1)
           | "(" number "," number "," number "," number ")"        // (2)
    
point := "(" number "," number ")"                                  // (3)

size := "(" number "," number ")"                                   // (4)

number := [ "+" | "-" ] \d+

You could of course use object as a type to store all the data inside the different symbols (terminals and non-terminals) ..... however, it would be wiser to use a type-safe syntax, such as generics.
Therefore, one does create a parser constructor by inheriting from the abstract class Piglet.Parser.Configuration.Generic.ParserConstructor<T>:

public class RectangleParserConstructor
    : ParserConstructor<Rectangle> // one must inherit 'ParserConstructor<T>'
{
    // implement the abstract method 'void Construct(T)'.
    protected override void Construct(NonTerminalWrapper<Rectangle> start_symbol)
    {
        // this is my naming convention for this example:
        //  t_xxx := terminal symbol
        // nt_xxx := non-terminal symbol

        // create all the terminal and non-terminal symbols:
        NonTerminalWrapper<Point> nt_point = CreateNonTerminal<Point>();
        NonTerminalWrapper<Size> nt_size = CreateNonTerminal<Size>();
        TerminalWrapper<string> t_comma = CreateTerminal(@",");
        TerminalWrapper<string> t_open_parenthesis = CreateTerminal(@"\(");
        TerminalWrapper<string> t_close_parenthesis = CreateTerminal(@"\)");
        TerminalWrapper<int> t_number = CreateTerminal<int>(@"[+\-]?\d+", int.Parse);

        // rule (1)
        start_symbol.AddProduction(t_open_parenthesis, nt_point, t_comma, nt_size, t_close_parenthesis)
                    .SetReduceFunction((_, point, _, size, _) => new Rectangle(point, size));

        // rule (2)
        start_symbol.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_comma, t_number, t_comma, t_number, t_close_parenthesis)
                    .SetReduceFunction((_, x, _, y, _, width, _, height, _) => new Rectangle(x, y, width, height));

        // rule (3)
        nt_point.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_close_parenthesis)
                .SetReduceFunction((_, x, _, y, _) => new Point(x, y));

        // rule (4)
        nt_size.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_close_parenthesis)
                .SetReduceFunction((_, width, _, height, _) => new Size(width, height));

        // I could change operator precedence, associativity, etc. here
        // I could also configure the parser to be case insensitive
    }
}

To use this parser, one uses the following few lines:

static void Main()
{
    var constructor = new RectangleParserConstructor();
    var parser = constructor.CreateParser();


    ParserResult<Rectangle> result = parser.Parse("((-10, 20), (100, 300))");

    Console.WriteLine(result.ParsedValue); // the parsed result (this has the type 'Rectangle'!!)
    Console.WriteLine(result.LexedTokens); // a list of lexed tokens
}

Couple of points worth mentioning:
  • The generic API needs some documentation and maybe a little clean-up.... I hope that I can do this soon.
  • This implementation is pretty neat when using F#.
    (I could give you an example for that if you want to)
  • Disadvantage: I have to generate these generic implementations (see the generated file). I have done this in the past using T4 templates -- I moved to a separate generator project. Maybe one could switch in the future to Roslyn Code Generators....

[The rectangle-above is rather boring and not very creative, but you definitely get the idea of generic parsers.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants