Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range formatting support #92

Merged
merged 18 commits into from
Dec 16, 2024
Merged

Add range formatting support #92

merged 18 commits into from
Dec 16, 2024

Conversation

lionel-
Copy link
Collaborator

@lionel- lionel- commented Dec 11, 2024

Closes #63.

Range formatting is trickier than whole document formatting because we need to find the complete entity that makes the most sense to format around the provided range. To that end we're allowed to format a larger range than provided (maybe a smaller as well but we don't do that here and that would likely feel inconsistent).

Once we have found the piece of code to format, anything outside it is left alone. In order to match the surrounding indentation, Biome (in this cas biome_formatter::format_sub_tree()) does some work to figure out the initial indentation and the resulting output is printed with this indentation.

Biome provides format_range() but I haven't ended up using it because of the way it selects a safe node to format. There are two issues with it:

For comparison here is what Ruff does with an algorithm that works in the opposite direction, starting from the root node: https://github.com/astral-sh/ruff/blob/0e9427255fe3f9f42e1947a6f35af4483c101e95/crates/ruff_python_formatter/src/range.rs#L149. The idea is to find the deepest "logical line" starting from the root. I used this as an inspiration for the find_deepest_enclosing_logical_lines() routine implemented in this PR. The main difference is that it finds all consecutive logical lines within the range so that the user can select multiple lines of code to format.

Currently I define a logical line as any child of the top-level program or of an { expression. In the future, I think it'd be helpful to also treat function arguments and binary operands as logical lines. That would be helpful when format-on-paste is enabled and the user is pasting inside an expanded argument list or an expanded pipeline. Air will be able to only format what's been pasted. Currently we'll extend the range to format the whole thing.

There's a bit of trickery involving rewrapping the nodes in an expression list, itself wrapped in an artificial RRoot node. The first wrapping is necessary to handle multiple logical lines. The second is necessary to resolve an issue in Biome's comment mapper. If the node to format is not at least two levels deep, leading comments end up attached to nodes deeper in the tree (see https://github.com/biomejs/biome/blob/2a098f94aa58cedfd2691de4f3fd3ef65c05c5d6/crates/biome_formatter/src/comments/builder.rs#L104-L112). This introduces hard line breaks where they are not expected. One effect from wrapping nodes in this way is that a hard line is introduced at the end of the expression list. We just remove it at the end.

@lionel- lionel- requested a review from DavisVaughan December 11, 2024 14:28
@DavisVaughan
Copy link
Collaborator

That would be helpful when format-on-paste is enabled and the user is pasting inside an expanded argument list or an expanded pipeline. Air will be able to only format what's been pasted. Currently we'll extend the range to format the whole thing.

I don't mind this honestly because if an argument breaks, then that means the entire function call should break. I imagine you might could get yourself into weird scenarios where an argument is formatted correctly but the function call as a whole isn't

Copy link
Collaborator

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything looks good except for the find_deepest_enclosing_logical_lines algorithm, see comments below

CHANGELOG.md Outdated Show resolved Hide resolved
crates/lsp/src/handlers_format.rs Show resolved Hide resolved
crates/lsp/src/handlers_format.rs Show resolved Hide resolved
crates/lsp/src/handlers_format.rs Show resolved Hide resolved
@lionel- lionel- force-pushed the feature/range-formatting branch from e619633 to c7741dc Compare December 12, 2024 13:32
@lionel-
Copy link
Collaborator Author

lionel- commented Dec 12, 2024

I don't mind this honestly because if an argument breaks, then that means the entire function call should break. I imagine you might could get yourself into weird scenarios where an argument is formatted correctly but the function call as a whole isn't

I was envisioning only expanded calls / pipelines would work this way. Flattened calls and pipelines would be formatted as a whole.

I think the idea for range formatting is not to leave the whole document in a consistently formatted state but just the minimal range that the user is currently working with. Changing code outside of this scope is likely to be distracting. Then the user can save or format the whole doc if needed.

That said I don't think this would be worth the extra complexity so I'm happy with what we have here.

Comment on lines 180 to 184
let logical_lines: Vec<RSyntaxNode> = iter
.map(|expr| expr.into_syntax())
.skip_while(|node| !node.text_range().contains(range.start()))
.take_while(|node| node.text_range().start() <= range.end())
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, were you expecting that it formats both 2 + 2 and THE ENTIRE for loop? i.e. note how 1 : 10 is formatted even though it is outside the selection.

It seems to be due to this bit here.

Screen.Recording.2024-12-12.at.5.16.07.PM.mov

Copy link
Collaborator

@DavisVaughan DavisVaughan Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in #92 (comment) I was expecting that this just formats 2 + 2 because that is the only node inside the common logical line "root" fully covered by the user's selection

It seems like the most conservative approach that avoids going outside of the user's selection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be this?

Suggested change
let logical_lines: Vec<RSyntaxNode> = iter
.map(|expr| expr.into_syntax())
.skip_while(|node| !node.text_range().contains(range.start()))
.take_while(|node| node.text_range().start() <= range.end())
.collect();
let logical_lines: Vec<RSyntaxNode> = iter
.map(|expr| expr.into_syntax())
.filter(|node| range.contains_range(node.text_trimmed_range()))
.collect();

Careful to use text_trimmed_range() here, because we'd want this (where < and > represent the selection) to still format the 1+1 even though it doesn't span the full text_range() when you add in the comment

# hi there
<1+1>

This breaks one of your tests related to partial selection (reproduced in the image below) where I'd argue that only 1+1 should be reformatted

Screenshot 2024-12-12 at 5 35 02 PM

Copy link
Collaborator Author

@lionel- lionel- Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was on purpose. We always format the selected part of user code. Sometimes it means widening the range to the highest level of logical line selected by the user. In other words we never narrow the requested range, we only widen it.

Copy link
Collaborator

@DavisVaughan DavisVaughan Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I don't love about this is that it means we can format lines arbitrarily far away from the user's actual selection

Like in this case I accidentally selected a little too far up and it means that the whole function is formatted, even if thats 20 lines further up

Screen.Recording.2024-12-13.at.10.02.53.AM.mov

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked and decided to keep the current behavior, but with a comment around these lines and the fact that they allow the current implementation to expand the user's range, and add in my suggested code above as a commented out alternative that is more conservative if we ever wanted to switch to that

@lionel- lionel- force-pushed the feature/range-formatting branch from f90e017 to 5e8dfd8 Compare December 16, 2024 10:58
@lionel-
Copy link
Collaborator Author

lionel- commented Dec 16, 2024

Here is an example of the behaviour difference I'm worried about between { lists and argument lists. In this screencast I'm pasting an expression in different parts of the file:

Screen.Recording.2024-12-16.at.12.10.44.mov

The formatting of other arguments in argument lists feel inconsistent with how we don't format expressions in braced lists. This is why I wondered if we should extend our notion of logical lines to include lines in expanded lists (and possibly lines in pipelines). But let's try out the current approach to start with.

@lionel- lionel- merged commit 357be89 into main Dec 16, 2024
4 checks passed
@lionel- lionel- deleted the feature/range-formatting branch December 16, 2024 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add FormatRange support
2 participants