Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303

InsertCreativityHere · 2022-10-17T20:31:24Z

Issues Fixed

Fixes #277

Pest eagerly consumed whitespace after a doc comment giving bogus spans.
LALRPOP ignores the whitespace and only include the comment's text in it's spans.

Fixes #233

The new parser no longer has a grammar rule for block doc comments.
All places where we used/tested them have been removed.

Implements & Closes #131

Attributes can be placed on both members and their types:
op([cs::identifier("foo")] myParam: [cs::generic("bar")] sequence<MyType>);
Type attributes must be placed on the type, and entity attributes must be placed on the identifier.
IMPORTANT: This PR doesn't add validation/testing for this. We should add validation/testing for this.

Closes #53

This issue was Pest-specific and is now pointless.

Explanation

This PR switches the parsing library we use from Pest to LALRPOP.
We still use Pest specifically for parsing doc comments, but this will be removed in the future. One step at a time!

This adds 2 'folder' modules: one for preprocessing and one for slice. Each folder contains:

grammar.lalrpop: Defines the grammar rules used by LALRPOP.
grammar.rs: Pulls in the generated LALRPOP code and defines helper functions.
lexer.rs: Converts a String of text into a stream (iterator) of tokens. This stream is used by the parser.
parser.rs: Parses a token stream into something
tokens.rs: Defines/lists the tokens/errors that the lexer and parser can return/work with.

Control Flow:

Read the contents of a file into a String.
Create a preprocessor-lexer from the string. This tokenizes the string into preprocessor tokens ('#', 'undef', '&&') and source blocks (Slice source code that isn't part of a preprocessor directive).
The lexer is just an iterator, so calling 'lexer.next()' returns the next token/sourceblock.
Pass this lexer into a Preprocessor and call parse_slice_file.
This evaluates the preprocessor directives and returns an iterator of source blocks.
Any blocks that are conditionally compiled out are not in this iterator.
Pass the source blocks to a slice-lexer. This tokenizes everything into slice tokens ('module', '?', 'identifier').
Pass this lexer into a Parser and call parse_slice_file.
This parses the slice and returns a SliceFile.

This is the source code that does this:
https://github.com/InsertCreativityHere/icerpc/blob/9b188ce56567a10d15eacbb6f13b608db2a329aa/src/parser/slice.rs#L38-L46

// Run the raw text through the preprocessor.
let mut defined_symbols = HashSet::new();
let mut preprocessor = crate::parsers::Preprocessor::new(file, &mut defined_symbols, self.diagnostic_reporter);
let preprocessed_text = preprocessor.parse_slice_file(raw_text)?;

// Run the preprocessed text through the parser.
let mut parser = crate::parsers::Parser::new(file, ast, self.diagnostic_reporter);
let (file_encoding, file_attributes, modules) = parser.parse_slice_file(preprocessed_text)?;

Other Changes

Adds LALRPOP as a dependency and a build script for the grammar files. They're compiled, not interpreted like Pests.
Removes the parent patcher. The parsing algorithm that LALRPOP uses makes it possible to set parents during parsing, instead of doing it later as a separate step.
Removes the new and add_x methods from grammar elements. Also no longer necessary.
Drops the raw_value field from Identifier. Literally wasn't used anywhere.
Removes the old Pest parsing code. Except for doc comment parsing. Another PR.

ReeceHumphreys · 2022-10-18T13:36:00Z

Just ran and pushed a commit that runs rustfmt over the repo. All of its formatting suggestions are now in one commit so it should be nice and easy to find any bad suggestions!

src/parser/slice.rs

ReeceHumphreys · 2022-10-18T13:51:07Z

Looks like icerpc-csharp is not building; got the following errors:

Cannot implicitly convert type 'System.Collections.Generic.Dictionary<int, int>' to 'IceRpc.Tests.Slice.CustomDictionary<int, int>'
Cannot implicitly convert type 'IceRpc.Tests.Slice.CustomDictionary<int, int>' to 'System.Collections.Generic.Dictionary<int, int>'
Cannot implicitly convert type 'int[]' to 'IceRpc.Tests.Slice.CustomSequence<int>'
Cannot implicitly convert type 'IceRpc.Tests.Slice.CustomSequence<int>' to 'int[]'
Cannot convert from 'IceRpc.Tests.Slice.CustomSequence<int>' to 'System.ReadOnlyMemory<int>'

This occurs in DictionaryMappingTests.cs and SequenceMappingTests.cs

InsertCreativityHere · 2022-10-18T14:09:07Z

Looks like icerpc-csharp is not building; got the following errors:

The new parser unintentionally fixed some bugs in the repo. I had a local patch that fixed them, and opened a PR for them here: icerpc/icerpc-csharp#1923

Co-authored-by: Reece Humphreys <reecewh@icloud.com>

tests/attribute_tests.rs

tests/encoding_tests.rs

tests/interfaces/operations.rs

tests/scope_resolution_tests.rs

tests/attribute_tests.rs

Co-authored-by: Reece Humphreys <reecewh@icloud.com>

externl

First batch of comments.

build.rs

src/diagnostics/errors.rs

src/parsers/common.rs

src/parsers/preprocessor/lexer.rs

src/parsers/preprocessor/mod.rs

src/parsers/slice/grammar.lalrpop

Co-authored-by: Joe George <joe@externl.com>

InsertCreativityHere · 2022-10-20T20:11:03Z

#303 (comment) > I would just fix them to expect a syntax error.

Alright! Sounds good to me!
Going to have to start including links in my quotes, it's getting so hard to find anything!

Keep 'em coming though! The more comments the more chances of catching things : v)

ReeceHumphreys · 2022-10-24T19:37:53Z

Everythings looking good. I'll approve once the format action passes 😉

InsertCreativityHere · 2022-10-24T19:50:27Z

I would just fix them to expect a syntax error.

The tests were changed to expect a syntax error. I included the location information even though I'm under the impression we don't actually check it. But I hope that's fine.

InsertCreativityHere · 2022-10-24T19:58:30Z

I'll approve once the format action passes

I kept a majority of the changes that rustfmt proposed, but the remaining ones seem incorrect to me.
Some examples:

-    ($child:expr, Module, $module_ptr:expr, $parser:expr) => {{

+    ($child:expr,Module, $module_ptr:expr, $parser:expr) => {{

It removes the space between Module and the comma before it. There should always be a space after a comma,
and this is how we actually call the macro too (with the space I mean).

-                // Ensure the next character is also a '|' (since the whole token should be "||").

+                                       // Ensure the next character is also a '|' (since the whole token should be "||").
                 if matches!(self.buffer.peek(), Some('|')) {

It shifts these comments way to the right to try and align them with the line above,
but that's bogus here, the comment is describing the line beneath it.

-        Some(crate::parser::comments::CommentParser::parse_doc_comment(&combined, dummy_span))

+        Some(crate::parser::comments::CommentParser::parse_doc_comment(
+            &combined, dummy_span,
+        ))

I don't think this syntax is correct. Either the function goes on one line, or each parameter gets it's own line.
I couldn't find anything in the style examples or guidelines that suggests this is correct.

ReeceHumphreys · 2022-10-24T20:11:21Z

It removes the space between Module and the comma before it. There should always be a space after a comma,
and this is how we actually call the macro too (with the space I mean).

Yes this is a bug, we should open an issue on the rustfmt repo.

Either the function goes on one line, or each parameter gets it's own line.

There is an option to configure this behavior for rustfmt. We can change that rule if we think this is bad code.

This reverts commit 60a5a71.

InsertCreativityHere · 2022-10-24T22:50:02Z

Yes this is a bug, we should open an issue on the rustfmt repo.

I've narrowed down the problem to their macro-argument-parser, when it recursively sub-parses the match arms:
https://github.com/rust-lang/rustfmt/blob/ef91154250977b3b5d05448dafbca524a1168b47/src/macros.rs#L904-L911
For some reason they only insert whitespace in between successive punctuation or successive identifiers:
https://github.com/rust-lang/rustfmt/blob/ef91154250977b3b5d05448dafbca524a1168b47/src/macros.rs#L829-L845
So:

ident,     // No whitespace, good!
,ident     // No whitespace, weird.
,,        // Inserts a space between then
ii        // Technically would insert a space between them, but I think this state is impossible, 'ii' would be treated as a single identifier.

I'll open a bug in the morning. You guys should copyedit my words for professionalism.

InsertCreativityHere · 2022-10-24T22:55:25Z

(https://github.com/rust-lang/rustfmt/blob/ef91154250977b3b5d05448dafbca524a1168b47/src/macros.rs#L992)

// This is a bit sketchy. The token rules probably need tweaking, but it works
// for some common cases. I hope the basic logic is sufficient. Note that the
// meaning of some tokens is a bit different here from usual Rust, e.g., `*`
// and `(`/`)` have special meaning.

InsertCreativityHere · 2022-10-25T14:41:22Z

I opened a bug for the whitespace issue: rust-lang/rustfmt#5573

InsertCreativityHere · 2022-10-25T14:46:18Z

There is an option to configure this behavior for rustfmt. We can change that rule if we think this is bad code.

Yeah, I think this is pretty weird looking for Rust. What do you have in mind?

InsertCreativityHere · 2022-10-25T15:38:12Z

I think the comment alignment might be buggy too, but that's less clear to me.
It leaves this completely alone:

    let x = 45; // Something else?
    let y = 574943834; // Thing1
    println!("{x} {y}"); // Thing2

But here it insists they must be visually aligned:

    let i = Identifier {
        value: "something".to_owned(), // Identifier
-        span: span.clone(), // Cool span
+        span: span.clone(),           // Cool span
    };

The only reason I think it's a bug is because usually rustfmt dislikes visual indentation like this.
The actual place this happens in the PR:

Diff in \\?\C:\Users\austin\Desktop\lalrpop-real\icerpc\src\parsers\slice\grammar.rs at line 388:
         data_type,
         tag,
         is_streamed,
-        is_returned: false, // Patched by its operation.
+        is_returned: false,                      // Patched by its operation.
         parent: WeakPtr::create_uninitialized(), // Patched by its container.

These comments are unrelated, so it feels weird to align them. They're only next to each other by coincedence here.
There's many other places where they aren't touching and rustfmt leaves them alone:

    let mut struct_ptr = OwnedPtr::new(Struct {
        identifier,
        members: Vec::new(),
        is_compact,
        parent: WeakPtr::create_uninitialized(), // Patched by its container.
        scope: parser.current_scope.clone(),
        attributes,
        comment,
        span,
        supported_encodings: None, // Patched by the encoding patcher.
    });

ReeceHumphreys · 2022-10-25T15:46:40Z

I think the comment alignment might be buggy too, but that's less clear to me.

I think this isnt a big deal for this PR. We can just add the newline and everything is fine.

                self.advance_buffer(); // Consume the '&' character.

                 // Ensure the next character is also an '&' (since the whole token should be "&&").
                 if matches!(self.buffer.peek(), Some('&')) {
                ...

The above works and rustfmt is fine with it. For issues like the one above lets just make those minor changes (in this case a line) and open an issue with rustfmt.

Yeah, I think this is pretty weird looking for Rust. What do you have in mind?

Lets let rustfmt does its thing for now and we can open an issue once this PR is merged to change the setting as I am guessing depending on what decision we come to it could change a lot of code.

InsertCreativityHere · 2022-10-25T15:54:40Z

Seems the visual alignment is a known issue: rust-lang/rustfmt#4108
Priority high even!

InsertCreativityHere · 2022-10-25T16:02:10Z

lets just make those minor changes

I am of the opinion that we shouldn't intentionally mis-format our code.
Especially after the sheer amount of time and effort I've spent making sure that everything is formatted correctly.

Lets let rustfmt does its thing for now and we can open an issue once this PR is merged to change the setting as I am guessing depending on what decision we come to it could change a lot of code.

I'm not convinced the setting exists.

InsertCreativityHere · 2022-10-25T19:35:05Z

@externl, @ReeceHumphreys is this okay to merge?
It passes the builds, tests, and linter.

I understand that rustfmt has issues with these changes,
and so you both do as well, but IMO these are bugs in rustfmt.

If you both feel that strongly about this, I'm fine with cargo +nightly fmting main after I merge...
Because at least, I would strongly prefer to have those changes isolated to their own commit.
That way, it's easy to go back and fix them once the bugs are fixed in rustfmt,
instead of having them lumped in with this massive PR.

externl · 2022-10-25T19:43:25Z

@externl, @ReeceHumphreys is this okay to merge? It passes the builds, tests, and linter.

I understand that rustfmt has issues with these changes, and so you both do as well, but IMO these are bugs in rustfmt.

If you both feel that strongly about this, I'm fine with cargo +nightly fmting main after I merge... Because at least, I would strongly prefer to have those changes isolated to their own commit. That way, it's easy to go back and fix them once the bugs are fixed in rustfmt, instead of having them lumped in with this massive PR.

Sure, this time we can run the formatter in a separate commit.

This is a companion PR for icerpc/slicec#303

InsertCreativityHere added 8 commits October 17, 2022 15:00

Added lalrpop dependency and build script.

a98b4dc

Added constructor function for 'Span'.

f0df17f

Removed constructor and add_x functions from grammar elements.

05d6389

Removed Pest parser, Pest preprocessor, and the parent patcher.

68306af

Passing integers as attribute arguments is a syntax error now.

3b84a1b

Updated the cSpell dictionary.

8d3b206

Don't need to specify default build script location apparently.

0c757f3

Everything else.

9b188ce

InsertCreativityHere requested review from pepone, externl, bernardnormier and ReeceHumphreys October 17, 2022 20:31

InsertCreativityHere and others added 4 commits October 17, 2022 16:56

Fixed linter warnings.

757b038

Fixed doc comment link.

e3048e2

More linter/formatting fixes.

31c62e3

Ran nightly rustfmt

59f1e91

ReeceHumphreys reviewed Oct 18, 2022

View reviewed changes

src/parser/slice.rs Outdated Show resolved Hide resolved

InsertCreativityHere mentioned this pull request Oct 18, 2022

Companion to LALRPOP PR icerpc/icerpc-csharp#1923

Merged

Remove useless borrow

a610c1a

Co-authored-by: Reece Humphreys <reecewh@icloud.com>

ReeceHumphreys reviewed Oct 18, 2022

View reviewed changes

tests/attribute_tests.rs Show resolved Hide resolved

InsertCreativityHere added 2 commits October 18, 2022 10:36

Removed unnecessary 'use' for 'Debug'.

915b5f5

Swapped words to reflect source order.

528bb08

ReeceHumphreys reviewed Oct 18, 2022

View reviewed changes

InsertCreativityHere and others added 2 commits October 18, 2022 10:54

Added TODO

b2101ac

Co-authored-by: Reece Humphreys <reecewh@icloud.com>

Added TODO

d78cc1e

Co-authored-by: Reece Humphreys <reecewh@icloud.com>

externl reviewed Oct 18, 2022

View reviewed changes

ReeceHumphreys reviewed Oct 18, 2022

View reviewed changes

src/parsers/slice/grammar.lalrpop Outdated Show resolved Hide resolved

src/parsers/slice/grammar.lalrpop Outdated Show resolved Hide resolved

Backticks instead of parentheses

84f1381

Co-authored-by: Joe George <joe@externl.com>

ReeceHumphreys and others added 6 commits October 21, 2022 11:19

Merge remote-tracking branch 'origin/main' into lalrpop-real

f69b0e9

Bogus inheritance is just a normal syntax error now.

c0a3a59

Removed support for explicit '+' integers.

487473d

Added issue tracking to a comment.

bfafdcf

Merge branch 'main' into lalrpop-real

e74b163

Unadded issue tracking to a comment...

11164ae

Fixed formatting of new errors.

94582f4

Apply rustfmt changes

60a5a71

InsertCreativityHere added 3 commits October 24, 2022 16:17

Revert "Apply rustfmt changes"

23dab6b

This reverts commit 60a5a71.

Type parameters must go after tag/stream keyword now.

66335ec

Made the preprocessor case-SENSATIVE.

2d22d4e

InsertCreativityHere merged commit 59b444f into icerpc:main Oct 25, 2022

InsertCreativityHere added a commit to icerpc/icerpc-csharp that referenced this pull request Oct 25, 2022

Companion to LALRPOP PR (#1923)

b85d5bb

This is a companion PR for icerpc/slicec#303

InsertCreativityHere mentioned this pull request Nov 14, 2022

Emit Error for Return Tuples with Less than 2 Elements #344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303

Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303

InsertCreativityHere commented Oct 17, 2022 •

edited

Loading

ReeceHumphreys commented Oct 18, 2022

ReeceHumphreys commented Oct 18, 2022

InsertCreativityHere commented Oct 18, 2022

externl left a comment

InsertCreativityHere commented Oct 20, 2022 •

edited

Loading

ReeceHumphreys commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022

ReeceHumphreys commented Oct 24, 2022 •

edited

Loading

InsertCreativityHere commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022 •

edited

Loading

InsertCreativityHere commented Oct 25, 2022 •

edited

Loading

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

ReeceHumphreys commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

externl commented Oct 25, 2022

Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303

Switch Parsing Algorithm From Pest(PEG) to LALRPOP(LALR(1)) #303

Conversation

InsertCreativityHere commented Oct 17, 2022 • edited Loading

Issues Fixed

Fixes #277

Fixes #233

Implements & Closes #131

Closes #53

Explanation

Control Flow:

Other Changes

ReeceHumphreys commented Oct 18, 2022

ReeceHumphreys commented Oct 18, 2022

InsertCreativityHere commented Oct 18, 2022

externl left a comment

Choose a reason for hiding this comment

InsertCreativityHere commented Oct 20, 2022 • edited Loading

ReeceHumphreys commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022

ReeceHumphreys commented Oct 24, 2022 • edited Loading

InsertCreativityHere commented Oct 24, 2022

InsertCreativityHere commented Oct 24, 2022 • edited Loading

InsertCreativityHere commented Oct 25, 2022 • edited Loading

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

ReeceHumphreys commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

InsertCreativityHere commented Oct 25, 2022

externl commented Oct 25, 2022

InsertCreativityHere commented Oct 17, 2022 •

edited

Loading

InsertCreativityHere commented Oct 20, 2022 •

edited

Loading

ReeceHumphreys commented Oct 24, 2022 •

edited

Loading

InsertCreativityHere commented Oct 24, 2022 •

edited

Loading

InsertCreativityHere commented Oct 25, 2022 •

edited

Loading