-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve multiline string handling #1879
Improve multiline string handling #1879
Conversation
This is definitely not ready to be merged yet (see TODOs above), but have a prototype that seems to work and wanted to share so I could start getting feedback. Main things I would want feedback on are a) if all the examples in the test suite mesh with the desired code styling for black and b) any high level advice on how to better integrate the new code with existing code. |
Also - let me know if there are any edge cases we want to say are too unlikely to occur in the wild that don't need to be covered, e.g. the multiline string as default value to function test case. I did use sourcegraph.com/search to find uses of multiline strings in the wild and added a few test cases based on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Here's some initial feedback:
- I also looked at some of the black-primer output and most of it looks like the new formatting is better. You don't need to submit PRs to these projects; they're expected to update themselves as Black releases a new version. You can update the primer config to mark these projects as having expected formatting changes.
- I don't think there are any edge cases that are too unlikely for Black to handle, since users run Black on probably pretty much any conceivable Python construct. It's OK to spend less effort on more obscure syntax as long as it doesn't cause crashes.
tests/data/multiline_strings.py
Outdated
**kwargs, | ||
): | ||
pass | ||
# output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer more newlines around the # output
comment so it's easier to find
# output | |
# output | |
tests/data/multiline_strings.py
Outdated
) | ||
call( | ||
3, | ||
textwrap.dedent("""cow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it'd be better to put the argument to dedent()
on a separate line here, so it's easier to count the arguments to call()
. But there's definitely some cases below where it makes more sense not to put a single multiline string argument on separate lines, so I'm open to persuasion otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is probably to fix Black's handling of the following code:
textwrap.dedent("""\
Hello, I am
a multiline string used with
a common idiom
""")
Right now Black transforms it to:
textwrap.dedent(
"""\
Hello, I am
a multiline string used with
a common idiom
"""
)
And this PR causes Black leave the code untouched.
Although I do agree that it would look better to have the multiline string argument on a separate line when there's more than one argument in the call. But then again, I don't know if that's even possible with our current Visitor design.
edit: if the argument-count dependent formatting is dumb or impossible, consider me +0.5 for keeping PR behaviour.
""" % ( | ||
_C.__init__.__code__.co_firstlineno + 1, | ||
) | ||
""" % (_C.__init__.__code__.co_firstlineno + 1,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Overall I do like the changes this introduces. Hopefully it's not too hard to explain these changes in word form that's understandable by Black's users.
Also, I would like to say sorry ahead of time for the terrible documentation workflow we have (it's mostly my fault). I really need to improve it and make it less painful to add/modify documentation but I'm slow and lazy so yeah.
I'm not qualified to review the actual formatting code, but I did notice a slight deficiency with your test code.
tests/test_black.py
Outdated
@patch("black.dump_to_file", dump_to_stderr) | ||
def test_multiline_strings(self) -> None: | ||
source, expected = read_data("multiline_strings") | ||
actual = fs(source) | ||
self.assertFormatEqual(expected, actual) | ||
black.assert_equivalent(source, actual) | ||
black.assert_stable(source, actual, DEFAULT_MODE) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PR #1785, writing simple tests like this one is quite easier. Just make the test data and add its normalized name (i.e. strip the .py
suffix) in the SIMPLE cases list in tests/test_format.py
.
I've done this for you in a PR against your branch since I can't suggest changes on lines/files you didn't modify yet I need to suggest a single addition in such a file :/
black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's black branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone; however most people likely will be running mainline black which would have mucked these formatting , so we're adding an exclusion rule in pyproject.toml prevent people from auto-formatting ropetest.
Also add pyproject.toml to avoid re-running black on ropetest black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's black branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone; however most people likely will be running mainline black which would have mucked these formatting , so we're adding an exclusion rule in pyproject.toml prevent people from auto-formatting ropetest.
black currently has poor multi-line string treatment for dedent()-ed code. I've ran aneeshusa's 'black' branch psf/black#1879 on ropetest instead, which leaves dedent()-ed lines alone while doing all its other cleanups. However most people likely will be running mainline black which would have mucked the formatting in these files, so I've also added an exclusion rule in pyproject.toml to prevent people from accidentally auto-formatting ropetest again. Until aneeshusa's branch are merged into mainline black, or black has a proper solution for dedent()-ed code, be careful of running black on ropetest.
This has some conflicts now. With our new stability policy in place, this would be a good candidate to go into the "unstable" flag for a year to see it mature. |
Just to provide some feedback, I've been using this PR's branch for the past year, and it does very well. Much better than default black's behavior when it comes to |
Hello, finally reviving this PR! I did some benchmarking to see if the current logic would be safe to merge as-is, Performance ResultsI used `diff-shades` and `pre-commit`Ran on a variety of repos (large, small, many changes, little/no changes) as well as a bunch of Lyft-internal repos. Some info:
Running black via pre-commitManually edited the `rev` in `.pre-commit-config.yaml` and ran `pre-commit run black -a -v`
diff-shades
|
Quick update, I added docs and updated the PR description so this is fully ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and well tested + documented.
Can we maybe add a f"""
multiline string just to ensure it works / no regressions there in future please. I know I use them and sure many others do.
Thank you for the review @cooperlees! I added multiline f-string test cases and fixed up the merge conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Functionality and test wise this all seems good to me. There is some deep code in lines.py that I don't get so would want one of the AST smart maintainers to be happy with before final merge.
diff-shades results comparing this PR (b2f7637) to main (9c8464c). The full diff is available in the logs under the "Generate HTML diff report" step.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for the work and sorry for the late review on my part as well. A couple of small comments and a gentle wish below if you still have energy for this PR 😅
I'd adore it if you could explain the algorithm a bit perhaps in a comment, or even try to break it into more bite-sized pieces so that the big picture is clearer. Like finding the leaves that are inside the line (L783-L788). Took me a solid hour to start to understand, particularly the manipulation of max_level_to_update
😄 (although in fairness it's not the most comfortable part of the code base for me to begin with).
Anyways, thank you for taking this on 🙏
@felix-hilden Thank you very much for the review! Sorry for the delay; I've added the test cases mentioned, dedented, and added more comments including a top-level comment for the algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for improving this still, LGTM!
Thank you to everyone who has reviewed, we appreciate the feedback. |
I think this commit has not resolve the case that the first parameter to a function is a multiline string, and the function has multiple arguments. for example:
is still formatted to:
|
Yes, multiple arguments are a different situation where this PR only discussed and solved the case for one argument (or similarly nested structures with only one child each). Two arguments being split over multiple lines is the intended behavior, as far as I'm concerned. |
Description
Improve formatting of multiline strings, especially in function calls,
by updating black to look at the context around multiline strings
to decide if they should be inlined or split to a separate line.
Currently behind the --preview flag.
Performance tested the new functionality in #1879 (comment).
Fixes #256.
Checklist - did you ...
CHANGES.md
if necessary?