Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format binary expressions #4862

Merged
merged 2 commits into from
Jun 6, 2023
Merged

Format binary expressions #4862

merged 2 commits into from
Jun 6, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jun 5, 2023

Summary

This PR implements basic formatting for binary expressions. It does not yet handle the special case where black breaks the right side first for call expressions, tuples, lists, ...

This PR also adds the infrastructure to preserve parentheses around expressions (depending on the context)

Test Plan

cargo test

@MichaReiser
Copy link
Member Author

MichaReiser commented Jun 5, 2023

)


# Black breaks the right side first for the following expressions:
Copy link
Member Author

@MichaReiser MichaReiser Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of these are handled correctly yet because the format implementation of the right hand side is missing

write!(
f,
[
left.format(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation heavily relies on recursion. We may want to explore to manually unroll left-balanced binary expressions because this could stack-overflow for very deep binary expressions (but so could any of our analyzer infrastructure).

}
}

fn is_expression_parenthesized(expr: &Expr, contents: &str) -> bool {
// Search backwards to avoid ambiguity with `(a, )` and because it's faster
matches!(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate. We now have to inspect the source code for every expression that we format but I haven't been able to come up with a better approach. Let's see what the benchmark says to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the alternatives? In the initial prototype, I think I handled these similarly to comments -- so, for every parenthesis, I associated it with the child it contains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered extracting the tokenized ranges from the token stream and either:

  • Store them in a vec and perform a binary search: I don't think this is faster because it trades the few character iterations with a binary search for every expression (except maybe if there are None).
  • Storing the parenthesized in a Trivia struct and look it up by node. Again, I don't think this would be faster because we trade the iterating over a few characters vs performing a hash map lookup (while fast, iterating over 1- characters is super fast too)

# Has many lines. Many, many lines.
# Many, many, many lines.
-"""Module docstring.
+(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to handle docstrings manually in the Suite formatting but that's not part of this PR

@MichaReiser MichaReiser force-pushed the format-binary-expression branch from f66044f to 8070eef Compare June 5, 2023 09:34
@MichaReiser MichaReiser marked this pull request as ready for review June 5, 2023 09:35
@MichaReiser MichaReiser added internal An internal refactor or improvement formatter Related to the formatter labels Jun 5, 2023
@MichaReiser MichaReiser force-pushed the format-binary-expression branch 2 times, most recently from 0b45066 to 364f656 Compare June 5, 2023 09:37
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      5.6±0.01ms     7.3 MB/sec    1.00      5.6±0.03ms     7.3 MB/sec
formatter/numpy/ctypeslib.py               1.00   1153.7±1.66µs    14.4 MB/sec    1.00   1158.9±1.54µs    14.4 MB/sec
formatter/numpy/globals.py                 1.00    133.2±1.86µs    22.2 MB/sec    1.00    132.7±0.46µs    22.2 MB/sec
formatter/pydantic/types.py                1.00      2.5±0.01ms    10.2 MB/sec    1.01      2.5±0.01ms    10.2 MB/sec
linter/all-rules/large/dataset.py          1.04     14.3±0.04ms     2.8 MB/sec    1.00     13.8±0.05ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      3.4±0.01ms     4.9 MB/sec    1.00      3.4±0.01ms     5.0 MB/sec
linter/all-rules/numpy/globals.py          1.02    418.9±2.08µs     7.0 MB/sec    1.00    411.9±0.68µs     7.2 MB/sec
linter/all-rules/pydantic/types.py         1.03      6.0±0.01ms     4.3 MB/sec    1.00      5.8±0.02ms     4.4 MB/sec
linter/default-rules/large/dataset.py      1.08      7.3±0.01ms     5.6 MB/sec    1.00      6.8±0.01ms     6.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.06   1549.7±2.59µs    10.7 MB/sec    1.00   1466.7±1.80µs    11.4 MB/sec
linter/default-rules/numpy/globals.py      1.04    169.4±0.25µs    17.4 MB/sec    1.00    162.3±0.27µs    18.2 MB/sec
linter/default-rules/pydantic/types.py     1.07      3.2±0.00ms     7.9 MB/sec    1.00      3.0±0.00ms     8.4 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.07      6.4±0.02ms     6.3 MB/sec    1.00      6.0±0.06ms     6.8 MB/sec
formatter/numpy/ctypeslib.py               1.07  1242.4±62.38µs    13.4 MB/sec    1.00  1160.6±11.13µs    14.3 MB/sec
formatter/numpy/globals.py                 1.03    134.3±2.62µs    22.0 MB/sec    1.00    130.9±4.85µs    22.5 MB/sec
formatter/pydantic/types.py                1.06      2.7±0.02ms     9.4 MB/sec    1.00      2.6±0.02ms     9.9 MB/sec
linter/all-rules/large/dataset.py          1.00     13.2±0.06ms     3.1 MB/sec    1.00     13.2±0.14ms     3.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.3±0.01ms     5.1 MB/sec    1.00      3.3±0.01ms     5.1 MB/sec
linter/all-rules/numpy/globals.py          1.00    337.0±2.57µs     8.8 MB/sec    1.00    337.0±3.31µs     8.8 MB/sec
linter/all-rules/pydantic/types.py         1.01      5.6±0.01ms     4.6 MB/sec    1.00      5.5±0.02ms     4.6 MB/sec
linter/default-rules/large/dataset.py      1.00      6.8±0.02ms     6.0 MB/sec    1.01      6.9±0.07ms     5.9 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1376.9±7.79µs    12.1 MB/sec    1.00   1380.5±9.83µs    12.1 MB/sec
linter/default-rules/numpy/globals.py      1.00    147.5±0.77µs    20.0 MB/sec    1.01    148.6±1.48µs    19.9 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.01ms     8.5 MB/sec    1.01      3.0±0.01ms     8.4 MB/sec

@MichaReiser MichaReiser force-pushed the format-binary-expression branch from 364f656 to 5ee366e Compare June 5, 2023 15:55
use rustpython_parser::ast::ExprList;

#[derive(Default)]
pub struct FormatExprList;

impl FormatNodeRule<ExprList> for FormatExprList {
fn fmt_fields(&self, item: &ExprList, f: &mut PyFormatter) -> FormatResult<()> {
write!(f, [verbatim_text(item.range)])
let ExprList {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full implementation, but it is good enough to test the right side breaking of binary expressions

aaaaaaaaaaaaaa
+ {a for x in bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb}
)
# Wraps it in parentheses if it needs to break both left and right
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new lines before the comment get removed. I'll look into this tomorrow as a separate PR

@MichaReiser MichaReiser force-pushed the format-binary-expression branch from 5ee366e to 5899fa9 Compare June 5, 2023 16:03
@MichaReiser MichaReiser force-pushed the format-binary-expression branch from 28014f5 to 5dde064 Compare June 6, 2023 06:08
@MichaReiser MichaReiser linked an issue Jun 6, 2023 that may be closed by this pull request
@MichaReiser MichaReiser force-pushed the format-binary-expression branch from 5dde064 to 5645338 Compare June 6, 2023 07:52
@MichaReiser MichaReiser enabled auto-merge (squash) June 6, 2023 07:52
@konstin konstin closed this Jun 6, 2023
auto-merge was automatically disabled June 6, 2023 08:27

Pull request was closed

@konstin konstin reopened this Jun 6, 2023
@konstin konstin enabled auto-merge (squash) June 6, 2023 08:27
@konstin konstin merged commit 3f032cf into main Jun 6, 2023
@konstin konstin deleted the format-binary-expression branch June 6, 2023 08:34
@konstin konstin mentioned this pull request Jun 12, 2023
72 tasks
konstin pushed a commit that referenced this pull request Jun 13, 2023
* Format Binary Expressions

* Extract NeedsParentheses trait
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formatter: BinOp (hard)
3 participants