Add smithy-syntax package and `smithy format` command #1830

mtdowling · 2023-06-16T20:59:58Z

This PR adds a new package called smithy-syntax that provides a token tree to represent a parsed model, and a formatter to format Smithy models. It's called a token tree instead of a parse tree because it bottoms-out at lexer tokens rather than more granular productions from the grammar for things like identifiers, quoted strings, etc.

A formatter is lumped in here to ensure that the toke tree works well and is usable for a real-world use case. smithy-syntax is currently marked as unstable, but we can make it stable after integrating it with the Smithy LSP.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

smithy-syntax is used to provide token trees (a very slightly higher level parse tree) of Smithy IDL models. This token tree is meant to be lossless, fault-tolerant, and easy to navigate. The use case for smithy-syntax is to make it easier to perform syntax analysis in tools like the LSP, IntelliJ, in-place model transforms (e.g., refactoring tools), and a formatter.

This commit adds a test for almost every implementation of the parse method for TreeType. OperationInput/Output/Errors, and number nodes aren't covered yet. These tests only include valid cases, and are not comprehensive yet. More cases are needed to check edge cases and different code paths for some TreeTypes, as well as error behavior. The test assertions check that the parser has produced the correct TreeTypes for the root node and its immediate children. Various minor bugs were fixed, and updates made while writing these tests: - CapturingTokenizer was updated to avoid a bug where the first character, $, was not included in the list of captured tokens. - An enum member with no ws around the '=' could have the value assingment skipped. - Members with a value assignment could consume whitespace that should be a part of the value assignment's BR. - ENUM_SHAPE_MEMBER was added. - Mixins added to operation shape. - Trait statements added to inline aggregate shape. - Some places changed an expect -> skip to the corresponding parse method. - TraitStructure and NodeValue were swapped in TRAIT_BODY_VALUE. - Removed extraneous BR from apply statement. - Block apply statement and singular apply statement had eachothers implementation.

Remove IdlParserTest

This change implements the updated grammar for traits and operations in #1800. WS and BR handling is now more explicit, including individual token trees for comments, spaces, and commas. NODE_OBJECT_KEY now contains an IDENTIFIER and QUOTED_TEXT tree rather than raw tokens. This change also cleans up token trees and adds some new features: * They implement FromSourceLocation * Removed unused methods.

All TreeTypeTests now expect the entire tree to be valid. This helps us find ambiguity and inconsistencies in the parser for cases when parsing a child tree is invalid only due to the context of the parent. Fixes minor bugs that caused trees to be invalid: - enumMember test didn't have a BR after VALUE_ASSIGNMENT Fills in missing tests for: - Operations - Block apply statements - Entity type name - Aggregate type name Other minor changes and improvements: - Adds util methods for parsing optional for-resource, mixins, and value assignment productions. - Adds util method for parsing any shape type and name. - Updates peekPastSpaces to not skip the current token, in case it is not SP. This simplifies parsing value assignments, and has more intuitive behavior. - Removes peekPastWs, which was only used for TRAIT_BODY_VALUE, inlining the implementation. peekPastWs skipped the current token like peekPastSpaces, which is necessary for TRAIT_BODY_VALUE. - Add ENTITY_TYPE_NAME production. - Updates name of some productions to be consistent with grammar. - Fixed a few places where TOKEN was parsed as IDENTIFIER.

This formatter uses Wadler's pretty printer algorithm to automatically format code based on line length (via prettier4j). Future commits should investigate shading the dependency on prettier4J.

Co-authored-by: Jordon Phillips <JordonPhillips@users.noreply.github.com>

This doesn't account for every possible bad doc comment (e.g., doc comments that come after traits). That can be a followup.

Node string values should parse as a SHAPE_ID not as an IDENTIFIER.

* Add parse failure tests Adds test cases that validate the parser fails when expected by asserting that any descendant of the root is an error tree. There is a test per-production, if that production has terminal tokens, or represents an alternation of productions that are distinguished by terminal tokens. For example, SHAPE_OR_APPLY_STATEMENT chooses between shape and apply statements based on whether the current lexemme is "apply". Only cases where the terminal tokens are invalid are tested because if a non-terminal (ie. another production) is invalid, the error tree is a child of the tree for that production. Since we also have tests that make sure the correct children are present for each production, we should have enough coverage to only test invalid non-terminals. For example, when parsing this shape statement: ``` structure Foo.Bar {} ``` we know the `Foo.Bar` is part of the IDENTIFIER production, we have tests that validate the the IDENTIFIER production is present, and we have tests that check for errors in IDENTIFIERs, so we don't need to specifically test this kind of error for shape statements. One caveat is that the existing tests I'm referring to only ensure all expected children are present for valid trees, so there is a logical inconsistency in the argument that these tests provide full coverage, but I can't think of a case that could cause a problem. Either way, these tests should provide _good enough_ coverage. Some bug fixes: - Fixed an issue where the error caused by not finding a "namespace" in NAMESPACE_STATEMENT was recovered by the parent tree. - Expect IDENTIFIER for "with" token in MIXINS production. * Add cases for productions with loops Adds more test cases for `*Statement` productions to ensure loops are broken out of properly when parsing fails.

Should not include the tree node if not present.

Minor changes to different docs and APIs for smithy-syntax

mtdowling and others added 18 commits June 16, 2023 15:59

Migrate to an enum based parser

cdcbab6

Update to latest grammar changes

80e209d

Use Stream for getting TokenTree tokens

6cda7f3

Remove IdlParserTest

Add a Smithy IDL model formatter

4455dbc

This formatter uses Wadler's pretty printer algorithm to automatically format code based on line length (via prettier4j). Future commits should investigate shading the dependency on prettier4J.

Adjust chain formatting

5d4e76a

Co-authored-by: Jordon Phillips <JordonPhillips@users.noreply.github.com>

Add format command to CLI

7880b90

Shade prettier4j in smithy-syntax

4bb7dcb

Ensure formatter tests are valid models

5bab0e7

Sort use statements automatically when formatting

188b965

Rewrite bad doc comments to normal comments

1ed9f7c

This doesn't account for every possible bad doc comment (e.g., doc comments that come after traits). That can be a followup.

Remove unused use statements and fix node parsing

ae8f688

Node string values should parse as a SHAPE_ID not as an IDENTIFIER.

Fix namespace statement parsing

f1ee1ac

Should not include the tree node if not present.

Prep API for release

a1dd2f5

Minor changes to different docs and APIs for smithy-syntax

mtdowling requested a review from a team as a code owner June 16, 2023 20:59

mtdowling changed the title ~~Add smithy-syntax package and smithy-format command~~ Add smithy-syntax package and smithy format command Jun 16, 2023

Attempt to fix windows issues

abe9e5c

JordonPhillips approved these changes Jun 19, 2023

View reviewed changes

milesziemer approved these changes Jun 19, 2023

View reviewed changes

mtdowling merged commit 6f6fe88 into main Jun 19, 2023

mtdowling deleted the token-tree branch September 11, 2023 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add smithy-syntax package and `smithy format` command #1830

Add smithy-syntax package and `smithy format` command #1830

mtdowling commented Jun 16, 2023

Add smithy-syntax package and smithy format command #1830

Add smithy-syntax package and smithy format command #1830

Conversation

mtdowling commented Jun 16, 2023

Add smithy-syntax package and `smithy format` command #1830

Add smithy-syntax package and `smithy format` command #1830