Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multiline string handling #1879

Merged
merged 18 commits into from
Mar 7, 2023

Conversation

aneeshusa
Copy link
Contributor

@aneeshusa aneeshusa commented Dec 22, 2020

Description

Improve formatting of multiline strings, especially in function calls,
by updating black to look at the context around multiline strings
to decide if they should be inlined or split to a separate line.
Currently behind the --preview flag.

Performance tested the new functionality in #1879 (comment).
Fixes #256.

Checklist - did you ...

  • Add an entry in CHANGES.md if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation?

@aneeshusa
Copy link
Contributor Author

This is definitely not ready to be merged yet (see TODOs above), but have a prototype that seems to work and wanted to share so I could start getting feedback. Main things I would want feedback on are a) if all the examples in the test suite mesh with the desired code styling for black and b) any high level advice on how to better integrate the new code with existing code.

@aneeshusa
Copy link
Contributor Author

Also - let me know if there are any edge cases we want to say are too unlikely to occur in the wild that don't need to be covered, e.g. the multiline string as default value to function test case. I did use sourcegraph.com/search to find uses of multiline strings in the wild and added a few test cases based on that.

Copy link
Collaborator

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Here's some initial feedback:

  • I also looked at some of the black-primer output and most of it looks like the new formatting is better. You don't need to submit PRs to these projects; they're expected to update themselves as Black releases a new version. You can update the primer config to mark these projects as having expected formatting changes.
  • I don't think there are any edge cases that are too unlikely for Black to handle, since users run Black on probably pretty much any conceivable Python construct. It's OK to spend less effort on more obscure syntax as long as it doesn't cause crashes.

**kwargs,
):
pass
# output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer more newlines around the # output comment so it's easier to find

Suggested change
# output
# output

)
call(
3,
textwrap.dedent("""cow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it'd be better to put the argument to dedent() on a separate line here, so it's easier to count the arguments to call(). But there's definitely some cases below where it makes more sense not to put a single multiline string argument on separate lines, so I'm open to persuasion otherwise.

Copy link
Collaborator

@ichard26 ichard26 Dec 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is probably to fix Black's handling of the following code:

textwrap.dedent("""\
    Hello, I am
    a multiline string used with
    a common idiom
""")

Right now Black transforms it to:

textwrap.dedent(
    """\
    Hello, I am
    a multiline string used with
    a common idiom
"""
)

And this PR causes Black leave the code untouched.

Although I do agree that it would look better to have the multiline string argument on a separate line when there's more than one argument in the call. But then again, I don't know if that's even possible with our current Visitor design.

edit: if the argument-count dependent formatting is dumb or impossible, consider me +0.5 for keeping PR behaviour.

""" % (
_C.__init__.__code__.co_firstlineno + 1,
)
""" % (_C.__init__.__code__.co_firstlineno + 1,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change

Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Overall I do like the changes this introduces. Hopefully it's not too hard to explain these changes in word form that's understandable by Black's users.

Also, I would like to say sorry ahead of time for the terrible documentation workflow we have (it's mostly my fault). I really need to improve it and make it less painful to add/modify documentation but I'm slow and lazy so yeah.

I'm not qualified to review the actual formatting code, but I did notice a slight deficiency with your test code.

Comment on lines 362 to 369
@patch("black.dump_to_file", dump_to_stderr)
def test_multiline_strings(self) -> None:
source, expected = read_data("multiline_strings")
actual = fs(source)
self.assertFormatEqual(expected, actual)
black.assert_equivalent(source, actual)
black.assert_stable(source, actual, DEFAULT_MODE)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PR #1785, writing simple tests like this one is quite easier. Just make the test data and add its normalized name (i.e. strip the .py suffix) in the SIMPLE cases list in tests/test_format.py.

I've done this for you in a PR against your branch since I can't suggest changes on lines/files you didn't modify yet I need to suggest a single addition in such a file :/

@ichard26 ichard26 added the F: strings Related to our handling of strings label Jul 16, 2021
lieryan added a commit to python-rope/rope that referenced this pull request Sep 26, 2021
black currently has poor multi-line string treatment for dedent()-ed code.

I've ran aneeshusa's black branch psf/black#1879
on ropetest instead, which leaves dedent()-ed lines alone; however
most people likely will be running mainline black which would have
mucked these formatting , so we're adding an exclusion rule in
pyproject.toml prevent people from auto-formatting ropetest.
lieryan pushed a commit to python-rope/rope that referenced this pull request Sep 26, 2021
Also add pyproject.toml to avoid re-running black on ropetest

black currently has poor multi-line string treatment for dedent()-ed code.

I've ran aneeshusa's black branch psf/black#1879
on ropetest instead, which leaves dedent()-ed lines alone; however
most people likely will be running mainline black which would have
mucked these formatting , so we're adding an exclusion rule in
pyproject.toml prevent people from auto-formatting ropetest.
lieryan added a commit to python-rope/rope that referenced this pull request Sep 26, 2021
black currently has poor multi-line string treatment for dedent()-ed code.

I've ran aneeshusa's 'black' branch psf/black#1879
on ropetest instead, which leaves dedent()-ed lines alone while doing
all its other cleanups.

However most people likely will be running mainline black which would
have mucked the formatting in these files, so I've also added an
exclusion rule in pyproject.toml to prevent people from accidentally
auto-formatting ropetest again.

Until aneeshusa's branch are merged into mainline black, or black has a
proper solution for dedent()-ed code, be careful of running black on
ropetest.
@JelleZijlstra
Copy link
Collaborator

This has some conflicts now. With our new stability policy in place, this would be a good candidate to go into the "unstable" flag for a year to see it mature.

@ichard26 ichard26 self-assigned this Apr 9, 2022
@lieryan
Copy link

lieryan commented Jul 18, 2022

Just to provide some feedback, I've been using this PR's branch for the past year, and it does very well. Much better than default black's behavior when it comes to dedent(). Hadn't encountered any issues.

@ichard26 ichard26 added S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable. help wanted Extra attention is needed labels Aug 3, 2022
@olivia-hong
Copy link
Contributor

Hello, finally reviving this PR!
I cleaned up it up to work with the latest main branch,
performance tested, 
and moved the logic under the --preview flag.

Still working on adding docs/updating comments but wanted to get some review on the PR in the meantime.

I did some benchmarking to see if the current logic would be safe to merge as-is,
and data indicates that there's no impact on performance
(times were roughly the same across all repos tested on).
Given that, would appreciate any advice from maintainers on the existing approach.

Performance Results I used `diff-shades` and `pre-commit`
Ran on a variety of repos (large, small, many changes, little/no changes) as well as a bunch of Lyft-internal repos.

Some info:

  • Run 1 actually applies formatting changes from the multiline PR (if applicable)
  • If Run 1 is marked N/A, this means no files were changed when applying the multiline PR.

  • Run 2 is the “steady-state” since it occurs after formatting changes have already been made.
  • I executed Run 2 with and without the cache by manually deleting it using
    rm -rf ~/Library/Caches/black

Running black via pre-commit

Manually edited the `rev` in `.pre-commit-config.yaml` and ran `pre-commit run black -a -v`
Repo v22.10.0 v22.10.0 w/ cache Multiline (Run 1) Multiline (Run 2) w/o cache Multiline (Run 2) w/ cache Files Changed # of Changes
django 10.62 1.02 11.1 10.14 1 28 471
pandas 11.73 0.63 14.63 11.47 0.61 53 1923
sqlalchemy 8.78 0.35 9.98 8.9 0.33 14 234
pyramid 1.89 0.4 N/A 1.94 0.37 3 46
pytest 1.85 0.21 3.82 1.91 0.2 64 9505
tox 1.13 0.17 2.06 1.12 0.18 7 39
typeshed 6.59 1.19 6.58 6.36 1.19 2 24
virtualenv 0.8 0.19 N/A 0.85 0.21 0 0
flake8-bugbear 0.81 0.15 N/A 0.94 0.15 0 0
opencv 1.86 0.24 4.34 1.78 0.26 13 0

diff-shades

Repo v22.1.0 Multiline Files Changed # of Changes
django 7 8 28 471
pandas 11 13 53 1923
sqlalchemy 8 9 14 234
pyramid 1 1 3 46
pytest 1 3 64 9505
tox 0 1 7 39
typeshed 4 4 2 24
virtualenv 0 0 0 0
flake8-bugbear 0 0 0 0
opencv N/A N/A 13 0

@olivia-hong
Copy link
Contributor

Quick update, I added docs and updated the PR description so this is fully ready for review

Copy link
Collaborator

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and well tested + documented.

Can we maybe add a f""" multiline string just to ensure it works / no regressions there in future please. I know I use them and sure many others do.

@olivia-hong
Copy link
Contributor

Can we maybe add a f""" multiline string just to ensure it works / no regressions there in future please. I know I use them and sure many others do.

Thank you for the review @cooperlees! I added multiline f-string test cases and fixed up the merge conflicts

Copy link
Collaborator

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Functionality and test wise this all seems good to me. There is some deep code in lines.py that I don't get so would want one of the AST smart maintainers to be happy with before final merge.

@github-actions
Copy link

github-actions bot commented Dec 23, 2022

diff-shades results comparing this PR (b2f7637) to main (9c8464c). The full diff is available in the logs under the "Generate HTML diff report" step.

╭─────────────────────────── Summary ────────────────────────────╮
│ 15 projects & 243 files changed / 13 792 changes [+4410/-9382] │
│                                                                │
│ ... out of 2 400 287 lines, 11 495 files & 23 projects         │
╰────────────────────────────────────────────────────────────────╯

Differences found.

What is this? | Workflow run | diff-shades documentation

Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for the work and sorry for the late review on my part as well. A couple of small comments and a gentle wish below if you still have energy for this PR 😅

I'd adore it if you could explain the algorithm a bit perhaps in a comment, or even try to break it into more bite-sized pieces so that the big picture is clearer. Like finding the leaves that are inside the line (L783-L788). Took me a solid hour to start to understand, particularly the manipulation of max_level_to_update 😄 (although in fairness it's not the most comfortable part of the code base for me to begin with).

Anyways, thank you for taking this on 🙏

tests/data/preview/multiline_strings.py Show resolved Hide resolved
tests/data/preview/multiline_strings.py Show resolved Hide resolved
src/black/lines.py Outdated Show resolved Hide resolved
@olivia-hong
Copy link
Contributor

@felix-hilden Thank you very much for the review! Sorry for the delay; I've added the test cases mentioned, dedented, and added more comments including a top-level comment for the algorithm.

Copy link
Collaborator

@felix-hilden felix-hilden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for improving this still, LGTM!

@aneeshusa
Copy link
Contributor Author

Thank you to everyone who has reviewed, we appreciate the feedback.
@JelleZijlstra, @cooperlees, @ichard26 with all comments addressed and 3 approvals, are we ready to merge the PR?

@JelleZijlstra JelleZijlstra merged commit 4a063a9 into psf:main Mar 7, 2023
@lieryan lieryan mentioned this pull request Mar 9, 2023
2 tasks
@danielruc91
Copy link

I think this commit has not resolve the case that the first parameter to a function is a multiline string, and the function has multiple arguments.

for example:

def _print(line, f):
    print(line, file=f)

_print(f"""
some
multiline string
""", some_file
)

is still formatted to:

def _print(line, f):
    print(line, file=f)


_print(
    f"""
some
multiline string
""",
    some_file,
)

@FichteFoll
Copy link

Yes, multiple arguments are a different situation where this PR only discussed and solved the case for one argument (or similarly nested structures with only one child each). Two arguments being split over multiple lines is the intended behavior, as far as I'm concerned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings help wanted Extra attention is needed S: up for grabs (PR only) Available for anyone to work on as the PR author is busy or unreachable.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary line breaks in method call on multiline string
9 participants