-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use node.loc.tokens to improve handling of parentheses. #537
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
benjamn
commented
Sep 10, 2018
newPath.firstInStatement() && | ||
!hasOpeningParen(oldPath)) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very glad to see this ceremony moved entirely into FastPath#hasParens
.
Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.
FastPath#hasParens is now implemented in terms of source tokens rather than source characters, so it should not be fooled as easily by intervening comments etc.
These options are documented in lib/options.js, and apply to the entire printing process, and don't change depending on how the various print functions are called.
https://ariya.io/2011/08/hall-of-api-shame-boolean-trap This will allow adding additional options without overcomplicating all the various places where print is called.
Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.
The complicated logic that I implemented previously was an approximation of a more fundamental and ultimately much simpler decision problem: some nodes need parentheses only because they can't come first in the enclosing statement, due to parsing ambiguities (e.g. FunctionExpression and ObjectExpression nodes). However, these nodes do not need to be followed immediately by a closing parenthesis, because the presence of an opening parenthesis is enough to resolve the parsing ambiguity.
benjamn
force-pushed
the
use-tokens-to-improve-parens-handling
branch
from
September 11, 2018 01:04
0475616
to
c331354
Compare
eventualbuddha
added a commit
to codemod-js/codemod
that referenced
this pull request
Nov 19, 2018
This is now required for recast, as it'll othewise fall back on esprima which doesn't work for JSX. benjamn/recast#537
eventualbuddha
added a commit
to codemod-js/codemod
that referenced
this pull request
Nov 19, 2018
This is now required for recast, as it'll othewise fall back on esprima which doesn't work for JSX. benjamn/recast#537
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting, and instead relied on scanning source characters, which is error-prone because identifying comments, string literals, regular expression literals, and other large tokens is difficult/impossible without doing a full tokenization and sometimes a small amount of initial parsing.
Most importantly, accurate token information can be used to detect whether a node was originally wrapped with parentheses, even if those parentheses were separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information:
With this change, every node in the AST returned by
recast.parse
will now have anode.loc.tokens
reference to shared array containing the entire sequence of original source tokens, as well asnode.loc.{start,end}.token
indexes into this array of tokens, such thatreturns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case
node.loc.start.token === node.loc.end.token
, which will be the index of the first token after the position where the node appeared.Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an
ast.tokens
array containing token objects withtoken.loc.{start,end}.{line,column}
information.