-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add smithy-syntax package and smithy format
command
#1830
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
smithy-syntax is used to provide token trees (a very slightly higher level parse tree) of Smithy IDL models. This token tree is meant to be lossless, fault-tolerant, and easy to navigate. The use case for smithy-syntax is to make it easier to perform syntax analysis in tools like the LSP, IntelliJ, in-place model transforms (e.g., refactoring tools), and a formatter.
This commit adds a test for almost every implementation of the parse method for TreeType. OperationInput/Output/Errors, and number nodes aren't covered yet. These tests only include valid cases, and are not comprehensive yet. More cases are needed to check edge cases and different code paths for some TreeTypes, as well as error behavior. The test assertions check that the parser has produced the correct TreeTypes for the root node and its immediate children. Various minor bugs were fixed, and updates made while writing these tests: - CapturingTokenizer was updated to avoid a bug where the first character, $, was not included in the list of captured tokens. - An enum member with no ws around the '=' could have the value assingment skipped. - Members with a value assignment could consume whitespace that should be a part of the value assignment's BR. - ENUM_SHAPE_MEMBER was added. - Mixins added to operation shape. - Trait statements added to inline aggregate shape. - Some places changed an expect -> skip to the corresponding parse method. - TraitStructure and NodeValue were swapped in TRAIT_BODY_VALUE. - Removed extraneous BR from apply statement. - Block apply statement and singular apply statement had eachothers implementation.
Remove IdlParserTest
This change implements the updated grammar for traits and operations in #1800. WS and BR handling is now more explicit, including individual token trees for comments, spaces, and commas. NODE_OBJECT_KEY now contains an IDENTIFIER and QUOTED_TEXT tree rather than raw tokens. This change also cleans up token trees and adds some new features: * They implement FromSourceLocation * Removed unused methods.
All TreeTypeTests now expect the entire tree to be valid. This helps us find ambiguity and inconsistencies in the parser for cases when parsing a child tree is invalid only due to the context of the parent. Fixes minor bugs that caused trees to be invalid: - enumMember test didn't have a BR after VALUE_ASSIGNMENT Fills in missing tests for: - Operations - Block apply statements - Entity type name - Aggregate type name Other minor changes and improvements: - Adds util methods for parsing optional for-resource, mixins, and value assignment productions. - Adds util method for parsing any shape type and name. - Updates peekPastSpaces to not skip the current token, in case it is not SP. This simplifies parsing value assignments, and has more intuitive behavior. - Removes peekPastWs, which was only used for TRAIT_BODY_VALUE, inlining the implementation. peekPastWs skipped the current token like peekPastSpaces, which is necessary for TRAIT_BODY_VALUE. - Add ENTITY_TYPE_NAME production. - Updates name of some productions to be consistent with grammar. - Fixed a few places where TOKEN was parsed as IDENTIFIER.
This formatter uses Wadler's pretty printer algorithm to automatically format code based on line length (via prettier4j). Future commits should investigate shading the dependency on prettier4J.
Co-authored-by: Jordon Phillips <JordonPhillips@users.noreply.github.com>
This doesn't account for every possible bad doc comment (e.g., doc comments that come after traits). That can be a followup.
Node string values should parse as a SHAPE_ID not as an IDENTIFIER.
* Add parse failure tests Adds test cases that validate the parser fails when expected by asserting that any descendant of the root is an error tree. There is a test per-production, if that production has terminal tokens, or represents an alternation of productions that are distinguished by terminal tokens. For example, SHAPE_OR_APPLY_STATEMENT chooses between shape and apply statements based on whether the current lexemme is "apply". Only cases where the terminal tokens are invalid are tested because if a non-terminal (ie. another production) is invalid, the error tree is a child of the tree for that production. Since we also have tests that make sure the correct children are present for each production, we should have enough coverage to only test invalid non-terminals. For example, when parsing this shape statement: ``` structure Foo.Bar {} ``` we know the `Foo.Bar` is part of the IDENTIFIER production, we have tests that validate the the IDENTIFIER production is present, and we have tests that check for errors in IDENTIFIERs, so we don't need to specifically test this kind of error for shape statements. One caveat is that the existing tests I'm referring to only ensure all expected children are present for valid trees, so there is a logical inconsistency in the argument that these tests provide full coverage, but I can't think of a case that could cause a problem. Either way, these tests should provide _good enough_ coverage. Some bug fixes: - Fixed an issue where the error caused by not finding a "namespace" in NAMESPACE_STATEMENT was recovered by the parent tree. - Expect IDENTIFIER for "with" token in MIXINS production. * Add cases for productions with loops Adds more test cases for `*Statement` productions to ensure loops are broken out of properly when parsing fails.
Should not include the tree node if not present.
Minor changes to different docs and APIs for smithy-syntax
mtdowling
changed the title
Add smithy-syntax package and
Add smithy-syntax package and Jun 16, 2023
smithy-format
commandsmithy format
command
JordonPhillips
approved these changes
Jun 19, 2023
milesziemer
approved these changes
Jun 19, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a new package called smithy-syntax that provides a token tree to represent a parsed model, and a formatter to format Smithy models. It's called a token tree instead of a parse tree because it bottoms-out at lexer tokens rather than more granular productions from the grammar for things like identifiers, quoted strings, etc.
A formatter is lumped in here to ensure that the toke tree works well and is usable for a real-world use case. smithy-syntax is currently marked as unstable, but we can make it stable after integrating it with the Smithy LSP.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.