Use node.loc.tokens to improve handling of parentheses. #537

benjamn · 2018-09-10T17:38:21Z

Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting, and instead relied on scanning source characters, which is error-prone because identifying comments, string literals, regular expression literals, and other large tokens is difficult/impossible without doing a full tokenization and sometimes a small amount of initial parsing.

Most importantly, accurate token information can be used to detect whether a node was originally wrapped with parentheses, even if those parentheses were separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information:

With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens reference to shared array containing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that

node.loc.tokens.slice(
  node.loc.start.token,
  node.loc.end.token
)

returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token after the position where the node appeared.

Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.

benjamn · 2018-09-10T17:40:56Z

lib/patcher.js

-      newPath.firstInStatement() &&
-      !hasOpeningParen(oldPath)) {
-    return false;
-  }


Very glad to see this ceremony moved entirely into FastPath#hasParens.

Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.

FastPath#hasParens is now implemented in terms of source tokens rather than source characters, so it should not be fooled as easily by intervening comments etc.

These options are documented in lib/options.js, and apply to the entire printing process, and don't change depending on how the various print functions are called.

https://ariya.io/2011/08/hall-of-api-shame-boolean-trap This will allow adding additional options without overcomplicating all the various places where print is called.

Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.

The complicated logic that I implemented previously was an approximation of a more fundamental and ultimately much simpler decision problem: some nodes need parentheses only because they can't come first in the enclosing statement, due to parsing ambiguities (e.g. FunctionExpression and ObjectExpression nodes). However, these nodes do not need to be followed immediately by a closing parenthesis, because the presence of an opening parenthesis is enough to resolve the parsing ambiguity.

#327 (comment)

Previously fixed by #505.

This is now required for recast, as it'll othewise fall back on esprima which doesn't work for JSX. benjamn/recast#537

benjamn self-assigned this Sep 10, 2018

benjamn commented Sep 10, 2018

View reviewed changes

benjamn added 21 commits September 10, 2018 21:00

Convert lib/options.js to use two spaces per tab.

0536fef

Make sure all the preconfigured parsers expose ast.tokens.

0d21dec

Make hasParens a method of FastPath.

67710b9

FastPath#hasParens is now implemented in terms of source tokens rather than source characters, so it should not be fooled as easily by intervening comments etc.

Make sure every token has a token.value string.

329d732

Rewrite FastPath#hasParens to tolerate ancestors with parens.

1886323

Remove maybeAddParens logic from lib/printer.js.

f6e06bd

Allow FastPath#get{Prev,Next}Token to be called with no argument.

920c701

Rename static Printer options to config.

3fa34fb

These options are documented in lib/options.js, and apply to the entire printing process, and don't change depending on how the various print functions are called.

Inline maybeReprint into main print helper function.

d79b9af

Inline printRootGenerically function into main print helper.

6bf2c81

Decompose printDecorators helper function from genericPrint.

165f6bb

Use options object instead of boolean includeComments parameter.

a2a5d39

https://ariya.io/2011/08/hall-of-api-shame-boolean-trap This will allow adding additional options without overcomplicating all the various places where print is called.

Reindent test/parens.js to use two spaces per tab.

0baf86f

Ensure identity of conservative reprinting in test/parens.js.

29f7720

Add a test that captures the intended behavior in issue #327.

a50a5dd

#327 (comment)

Add a regression test to ensure issue #533 is fixed.

f5a8b2f

Alternate fix for issues #504 and #512.

02cb38c

Previously fixed by #505.

Add a regression test for issue #366.

c331354

benjamn force-pushed the use-tokens-to-improve-parens-handling branch from 0475616 to c331354 Compare September 11, 2018 01:04

benjamn merged commit 8a5493b into master Oct 13, 2018

eventualbuddha added a commit to codemod-js/codemod that referenced this pull request Nov 19, 2018

fix(recast): always generate tokens

5f162f8

This is now required for recast, as it'll othewise fall back on esprima which doesn't work for JSX. benjamn/recast#537

eventualbuddha added a commit to codemod-js/codemod that referenced this pull request Nov 19, 2018

fix(recast): always generate tokens

ca3b486

This is now required for recast, as it'll othewise fall back on esprima which doesn't work for JSX. benjamn/recast#537

benjamn mentioned this pull request Feb 7, 2019

Option to omit loc field from output #568

Closed

gnprice mentioned this pull request Jun 12, 2022

Reprinting file causes line break changes #215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use node.loc.tokens to improve handling of parentheses. #537

Use node.loc.tokens to improve handling of parentheses. #537

benjamn commented Sep 10, 2018 •

edited

Loading

benjamn Sep 10, 2018

Use node.loc.tokens to improve handling of parentheses. #537

Use node.loc.tokens to improve handling of parentheses. #537

Conversation

benjamn commented Sep 10, 2018 • edited Loading

benjamn Sep 10, 2018

Choose a reason for hiding this comment

benjamn commented Sep 10, 2018 •

edited

Loading