Add methods for dealing with whitespaces #474

marcelocenerine · 2017-12-05T02:43:35Z

When working on a scalafix rule for scala/collection-strawman I implemented my own logic to get rid of whitespaces. @olafurpg mentioned in the review that it would be nice to have such functionality supported by Scalafix. Later I realized that there was even an issue filed by @gabro to look into that: #370.

I explored a few alternatives to introduce the functionality. There might be better ways to achieve this, so I'd appreciate your feedback and will be willing to accept suggestions.

The code changes in this PR implement the approach that I personally prefer. There are other 2 alternatives available on separate branches (those need some polishing).

1 - trimLeft / trimRight / trim (master...marcelocenerine:trim):
These methods behave similarly to Java's String.trim in the sense that they eliminate not only spaces but newline tokens as well. This may not be flexible enough for users who just want to get rid of spaces/tabs and preserve blank lines.
The naming is based on what I originally suggested in the comment above as well as by @ShaneDelmore here

2 - trimLeft / trimRight / trim with a includingNewline parameter (code in the PR):
This is similar to the alternative above, except that the trimming methods take an optional parameter (set to true by default) that can be used to indicate when newlines should be preserved. This is based on @gabro's suggestion on #370

3 - ctx.whitespaces.leftTo / rightTo / around (master...marcelocenerine:trim_whitespaces):
RuleCtx already has ~30 methods and adding 3 more would make it more bloated (#380). This approach basically introduces an utility class that groups functions to find/select whitespaces (similar to MatchingParens and AssociatedComments). The output of its methods can be passed to ctx.removeTokens to perform the actual removal. Example:

ctx.removeTokens(ctx.whitespaces.leftTo(tk))

MasseGuillaume · 2017-12-05T09:25:05Z

Hi Marcelo, this is a really useful contribution, thanks a lot!

The CI is red because of mima
https://travis-ci.org/scalacenter/scalafix/jobs/311647980#L4256-L4269

the trait PatchOps has 3 new methods (trimLeft, trimRight, trim) and this is a public api. This is not super useful because the only implementation is via RuleCtix/RuleCtxImpl. Therefore, you can exclude them in the Mima check:

ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trim")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimRight")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimLeft")

in https://github.com/scalacenter/scalafix/blob/master/project/Mima.scala#L7

olafurpg

Thank you for this contribution, I am eager to use this myself 😄 left a few minor comments, otherwise implementation and tests look great 👍

olafurpg · 2017-12-05T10:25:59Z

scalafix-tests/unit/src/test/scala/scalafix/tests/core/PatchSuite.scala

+    "trimLeft removes spaces and newline before `val`",
+    original,
+    """// Foobar
+      |object a {val x = 2


What happens if there is a // comment after the open curly brace? We should not remove newlines trailing such comments.

We could test this by trimming left of "object"

In the example above, the result would be:

// Foobar object a {

I put some conditions to prevent this issue (comments in the format /* ... */ are OK). The same would occur when trimming on the right side of a comment though....

Comments are not the only case where trimming would cause an invalid output. Example:

object Foo { val a = 0 val b = 0 }

ctx.trimLeft(bValToken)

Output:

object Foo { val a = 0 val b = 0 }

I feel that the current behavior for newlines is not very intuitive and useful in practice. I can't think of a valid scenario where I would like to get rid of the left/right newline and have the remaining code on that particular line (or the line below) moved up (update: except when the entire line is intended to be removed or when an expression wraps across multiple lines). Is there any use case that comes to your mind?

From the example given above and the use case @gabro described in #370, maybe the right API would be sth like ctx.removeLine(token) (trim left/right spaces is still useful though). What do you think?

olafurpg · 2017-12-05T10:34:41Z

scalafix-core/shared/src/main/scala/scalafix/internal/rule/RuleCtxImpl.scala

+  private def canBeTrimmed(tk: Token, includeNewLine: Boolean): Boolean =
+    tk match {
+      case Space() | Tab() | FF() => true
+      case Newline() if includeNewLine => true


I'm wondering if we should limit this to a single newline. A typical use case if you want to remove a statement then you usually want to keep the leading / trailing blank lines. What do you think?

Just to understand, given this example:

val x = 42 "foo" val y = "bar"

would the current implementation be able to produce this diff?

val x = 42 - - "foo" val y = "bar"

so that the resulting code is

val x = 42 val y = "bar"

I think the implementation should produce this diff

val x = 42 - "foo" val y = "bar"

That would leave two blank lines, but it would also preserve a blank line in the following case instead of remove it

val x = 42 - "foo" val y = "bar"

Right. Probably the job of taking care of multiple blank lines belongs to a formatter

Yes, scalafmt would at least combine the two blank lines into a single blank line.

olafurpg · 2017-12-05T10:42:49Z

scalafix-core/shared/src/main/scala/scalafix/patch/PatchOps.scala

+    * Remove whitespaces on the left side of the token. Use `includeNewLine` to
+    * indicate when newline tokens should be preserved.
+    */
+  def trimLeft(token: Token, includeNewLine: Boolean = true): Patch


What do you think about having a separate method to trim only spaces? trimLeftSpaces, trimSpaces, trimRightSpaces (name up for debate). Tabs can be ignored, I'm not aware of any Scala project using tabs for indent.

I think it could result in a cleaner user API, boolean parameters are quite awkward to both write and read.

you are right, @olafurpg. Boolean parameters often indicate the need for two separate methods. My only concern is the explosion of methods (potentially 6 new ones) in the RuleCtx api (those would be inherited from PatchOps). If you think that's not a concern I will go for it.

I think the eventual home for these methods will be simple defs in object Patch. I personally think it's OK to have more methods if it buys us a cleaner api.

MasseGuillaume · 2017-12-07T13:32:11Z

@marcelocenerine we are thinking of cutting a release today. Do you think you can address the comments above? This is a really valuable contribution and we would be happy to add it to the upcoming version.

marcelocenerine · 2017-12-07T14:16:12Z

@MasseGuillaume, unfortunately I will only be able to address the comments in the evening. Please feel free to continue the work if you guys want it to make the release cut today. Otherwise I will update the PR this evening

MasseGuillaume · 2017-12-07T14:24:04Z

@marcelocenerine cool, I'm going to address the PR comments and include it in this release.

MasseGuillaume · 2017-12-07T15:02:55Z

@marcelocenerine don't know enough about token manipulation to address this PR for the release. I'm putting the ball back on your side for the next release. :)

marcelocenerine · 2017-12-07T15:08:26Z

@MasseGuillaume, ok. I will update the PR tonight :)

marcelocenerine · 2017-12-08T03:30:17Z

@olafurpg thanks for your suggestions!!! I updated the PR addressing your comments, but I think we should discuss more on the behavior for newlines. Please see my comment: #474 (comment)

marcelocenerine · 2017-12-17T19:21:43Z

scalafix-core/shared/src/main/scala/scalafix/internal/rule/RuleCtxImpl.scala

+  private def isSingleLineComment(tk: Token): Boolean =
+    tk.is[Comment] && tk.text.startsWith("//")
+  private def isWindowsNewline(tk1: Token, tk2: Token): Boolean =
+    tk1.is[CR] && tk2.is[LF]


I ran tests on Windows with a file containing CR+LF characters and it seems that scalameta ignores the CR ones, keeping only the LF tokens

We have some windows line break related issues that I have never properly looked into scalameta/scalameta#443

marcelocenerine · 2017-12-17T19:34:11Z

Hey @olafurpg, did you get a chance to take a look at my latest changes as well as my comment #474 (comment) ?

I created this branch master...marcelocenerine:trim_normalize with an alternative implementation (removeLine instead of trimming newlines) as well as methods to normalize spaces (names up for debate). Please see what you think ;)

olafurpg · 2017-12-18T16:41:52Z

Hey @marcelocenerine so sorry, I have maybe 2-3 different pending reviews for this PR that I never got around to submit. Every time I look at it I start writing tons of comments, then I change my mind and 20 minutes I decide to "look at it again later" 😅

I took a look at your trim_normalize branch and I'm not convinced about normalize* and removeLine. Remove line seems particularly dangerous, since it doesn't handle multiline expressions. Normalize seems quite specific so I'm not sure it needs to be in the core API, unless you have a concrete use-case?

I'm wondering if we can simplify things by moving these methods to TokenList

class TokenList {
  def leading
  def leadingSpaces
  def leadingLine // includes spaces + first newline
  def trailing
  def trailingSpaces
  def trailingLine

I'm personally most excited about leadingSpace, since that one is most annoying to implement and it's quite common for me to remove statements like in

scalafix/scalafix-core/shared/src/main/scala/scalafix/internal/rule/RemoveUnusedTerms.scala

Line 44 in 8bd026d

.fold(Patch.empty)(token => ctx.removeTokens(token))

Still, that means users will need to feed the output of those methods to removeTokens, would that defeat the purpose?

OK, I'm submitting this comment before I change my mind again 😂

marcelocenerine · 2017-12-18T19:21:01Z

Hey @olafurpg, thanks for your feedback. Your hesitation highlights the fact that this PR needs more work (especially when it comes to handling newlines) 😄

As a rule author I'd be more interested in the leadingSpaces/trailingSpaces for use cases where I know what tokens can precede/follow a given token that I want to remove/replace (silly example: input: 1 + 2 ; output: 1.+(2)). If I have that awareness and don't care about whitespaces in between, then the leading/trailing spaces methods would work just fine. For the example above, I wouldn't even care whether the whitespaces are spaces, newlines, tabs or a mix of everything.

I think that removing newlines is dangerous unless the rule is aware of what tokens exist in the prev/next line as well as in the current line. As you pointed out, removeLine is indeed dangerous if a rule doesn't have that awareness. I think the same would be true for leadingLine/trailingLine. Those operations would only be useful if rules knew that either the current line could be entirely removed or whatever tokens remained in the current line could be safely moved up/down (e.g.: multiline expressions).

I'd be more interested in methods to find the prev/next newline character instead and then write my own logic to slice(newline, token) and figure out what is in between or what precedes/follows the newline token so I could determine if they could be removed.

def nextNewline: Option[LF] // or Option[Token] or else...
def prevNewline: Option[LF]

or maybe a method that returned all leading/trailing tokens in the same line.

What do you think? Should I update the PR following your suggestion? Or do the above instead of leadingLine/trailingLine? Or just focus on spaces for now?

Still, that means users will need to feed the output of those methods to removeTokens, would that defeat the purpose?

To be honest, I'd prefer to have a rich API focused in finding/selecting tokens and a cleaner API with basic operations for adding/removing/replacing things (PatchOps).

Alternatively, given that additional whitespaces between tokens are irrelevant to determine the correctness of a program (just an aesthetic concern), should rules just ignore them and let a code formatting tool finish the job?

olafurpg · 2017-12-19T11:26:50Z

I'd be more interested in methods to find the prev/next newline character instead and then write my own logic to slice(newline, token) and figure out what is in between or what precedes/follows the newline token so I could determine if they could be removed.

Would prevNewline skip over non-whitespace tokens? Example, would it return None or Some here?

{
  val x = <<prevNewline>>

If it returns the newline after { (implemented as .find(_.is[Newline])) then it would be less useful IMO.

To be honest, I'd prefer to have a rich API focused in finding/selecting tokens and a cleaner API with basic operations for adding/removing/replacing things (PatchOps).

I'm glad we agree there!

Alternatively, given that additional whitespaces between tokens are irrelevant to determine the correctness of a program (just an aesthetic concern), should rules just ignore them and let a code formatting tool finish the job?

Trickier cases I think we should delegate to a formatter, but for some simple rewrites I think it would be nice to offer a best-effort by for example trimming enclosing spaces, for example in #513

OK, here is my concrete proposal:

TokenList.leadingSpaces/trailingSpaces
Make TokenList final, adding relevant filters in Mima.

It seems we are still unsure what to do about lines so we can give that a bit more time to think and leave it for another PR. I think leadingLine(Token/Tree) (without trailing option) would be a nice complement to ctx.removeTokens(leadingLine(tree) ++ tree.tokens).

marcelocenerine · 2017-12-19T13:01:00Z

Would prevNewline skip over non-whitespace tokens?

yeah, I was thinking something like that but, like you said, it's already easy to achieve that goal through the current api. Except if we wanted to save users from having to deal with the intricacies of Windows/Unix line breaks (as long as scalameta cared about that difference).

Your proposal sounds good to me. In my short experience with scalafix I have come across scenarios where I missed utilities to get rid of whitespaces. I wanted to cover the other use case in #370 (newlines) as well but I think that it needs some more thought so that whatever comes out of it is generic enough for people to use.

I will update the PR shortly 😅

marcelocenerine · 2017-12-19T15:09:07Z

scalafix-tests/unit/src/test/scala/scalafix/tests/core/util/TokenListTest.scala

+    val tokenList = TokenList(emptyFileTokens)
+    assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq())
+  }
+


it's kinda off topic but I think it's worth mentioning: when I was working on the previous implementation I found 2 things in the API that didn't behave as I expected (http://wiki.c2.com/?PrincipleOfLeastAstonishment):

prev/next returns the input token if it is the head/last of the tokenList. Would Option[Token] make more sense?

slice(from, to) doesn't behave as in Scala's std collection given that from is not inclusive. For instance, while removing tokens what could take a single step (ctx.removeTokens(tokenList.slice(tk, last)) actually requires 2 (ctx.removeToken(tk) + ctx.removeTokens(tokenList.slice(tk, last)) and is really easy to introduce bugs if users don't dive into the implementation of slice

These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6

Awesome! Here are the 2 issues I created: #516 & #517

marcelocenerine · 2017-12-19T15:10:34Z

The PR is updated now.

olafurpg

This looks great @marcelocenerine! Thanks a lot of your patience and iterating on the design. 👍

olafurpg · 2017-12-19T16:21:47Z

scalafix-tests/unit/src/test/scala/scalafix/tests/core/util/TokenListTest.scala

+    val tokenList = TokenList(emptyFileTokens)
+    assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq())
+  }
+


These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6

``` > scalafix --diff-base <tab> HEAD ag/gh-pages ag/issue-260 ag/issue-319 ag/master ... v0.5.5 v0.5.6 |00| 51abefa -- wip improve sbt completion (13 hours ago) |01| 3f63b5b -- Merge pull request scalacenter#474 from marcelocenerine/trim_includeNewLine (13 hours ago) |02| 47b94c0 -- Add leadingSpaces/trailingSpaces methods to TokenList (13 hours ago) |03| 8bd026d -- Merge pull request scalacenter#503 from MasseGuillaume/feature/368-escape-patches (13 hours ago) |04| 20e445f -- Escape hatch on Patch (fix scalacenter#368) (13 hours ago) |05| a2c5d70 -- Merge pull request scalacenter#500 from MasseGuillaume/feature/495-custom-id (13 hours ago) |06| 59efe7d -- Add CustomMessage to the public api (13 hours ago) |07| 9ae6071 -- Add id for CustomMessage (fix scalacenter#495) (13 hours ago) |08| e4a5c35 -- Merge pull request scalacenter#494 from MasseGuillaume/disable-regex (13 hours ago) |09| a422860 -- Merge pull request scalacenter#497 from olafurpg/disable-signatures (13 hours ago) |10| 7930947 -- DisableSyntax add regex (13 hours ago) |11| 5dbdd6b -- IntervalSet test for empty and add toString (13 hours ago) |12| b022fbd -- DisableSyntax don't repeat DisableSyntax.keyword in message (13 hours ago) |13| a992b02 -- Assert instead of scalafix:ok (13 hours ago) |14| 7896ccd -- Refactor Disable to use views. (13 hours ago) |15| 58acdbe -- Fix scalacenter#493, handle synthetics and symbol signatures in Disable. (13 hours ago) |16| b48d7f0 -- Merge pull request scalacenter#490 from olafurpg/unmanagedSources (13 hours ago) |17| e9b2b0a -- s/canFormat/canFix/ (13 hours ago) |18| 26be6fa -- Use unmanagedSources instead of unmanagedSourceDirectories. (13 hours ago) |19| 4d46001 -- Merge pull request scalacenter#488 from olafurpg/master (13 hours ago) ```

olafurpg suggested changes Dec 5, 2017

View reviewed changes

marcelocenerine force-pushed the trim_includeNewLine branch from f386574 to 41fe3a5 Compare December 8, 2017 03:25

marcelocenerine commented Dec 17, 2017

View reviewed changes

marcelocenerine force-pushed the trim_includeNewLine branch from 41fe3a5 to 2a2c7bf Compare December 19, 2017 14:57

marcelocenerine commented Dec 19, 2017

View reviewed changes

marcelocenerine force-pushed the trim_includeNewLine branch from 2a2c7bf to aae7ef1 Compare December 19, 2017 15:14

Add leadingSpaces/trailingSpaces methods to TokenList

47b94c0

marcelocenerine force-pushed the trim_includeNewLine branch from aae7ef1 to 47b94c0 Compare December 19, 2017 15:25

olafurpg approved these changes Dec 19, 2017

View reviewed changes

olafurpg merged commit 3f63b5b into scalacenter:master Dec 19, 2017

marcelocenerine deleted the trim_includeNewLine branch February 19, 2018 18:49

olafurpg added this to the v0.6.0-M2 milestone Apr 11, 2018

olafurpg mentioned this pull request Jun 15, 2018

Provide some API for dealing with leading whitespaces #370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add methods for dealing with whitespaces #474

Add methods for dealing with whitespaces #474

marcelocenerine commented Dec 5, 2017 •

edited

Loading

MasseGuillaume commented Dec 5, 2017

olafurpg left a comment

olafurpg Dec 5, 2017

olafurpg Dec 5, 2017

marcelocenerine Dec 8, 2017 •

edited

Loading

olafurpg Dec 5, 2017

gabro Dec 6, 2017

olafurpg Dec 6, 2017

gabro Dec 6, 2017

olafurpg Dec 6, 2017

olafurpg Dec 5, 2017

marcelocenerine Dec 5, 2017

olafurpg Dec 5, 2017

MasseGuillaume commented Dec 7, 2017

marcelocenerine commented Dec 7, 2017

MasseGuillaume commented Dec 7, 2017

MasseGuillaume commented Dec 7, 2017

marcelocenerine commented Dec 7, 2017

marcelocenerine commented Dec 8, 2017

marcelocenerine Dec 17, 2017 •

edited

Loading

olafurpg Dec 18, 2017

marcelocenerine commented Dec 17, 2017 •

edited

Loading

olafurpg commented Dec 18, 2017

marcelocenerine commented Dec 18, 2017 •

edited

Loading

olafurpg commented Dec 19, 2017

marcelocenerine commented Dec 19, 2017 •

edited

Loading

marcelocenerine Dec 19, 2017

olafurpg Dec 19, 2017

marcelocenerine Dec 19, 2017 •

edited

Loading

marcelocenerine commented Dec 19, 2017

olafurpg left a comment

olafurpg Dec 19, 2017

Add methods for dealing with whitespaces #474

Add methods for dealing with whitespaces #474

Conversation

marcelocenerine commented Dec 5, 2017 • edited Loading

MasseGuillaume commented Dec 5, 2017

olafurpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelocenerine Dec 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MasseGuillaume commented Dec 7, 2017

marcelocenerine commented Dec 7, 2017

MasseGuillaume commented Dec 7, 2017

MasseGuillaume commented Dec 7, 2017

marcelocenerine commented Dec 7, 2017

marcelocenerine commented Dec 8, 2017

marcelocenerine Dec 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelocenerine commented Dec 17, 2017 • edited Loading

olafurpg commented Dec 18, 2017

marcelocenerine commented Dec 18, 2017 • edited Loading

olafurpg commented Dec 19, 2017

marcelocenerine commented Dec 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelocenerine Dec 19, 2017 • edited Loading

Choose a reason for hiding this comment

marcelocenerine commented Dec 19, 2017

olafurpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelocenerine commented Dec 5, 2017 •

edited

Loading

marcelocenerine Dec 8, 2017 •

edited

Loading

marcelocenerine Dec 17, 2017 •

edited

Loading

marcelocenerine commented Dec 17, 2017 •

edited

Loading

marcelocenerine commented Dec 18, 2017 •

edited

Loading

marcelocenerine commented Dec 19, 2017 •

edited

Loading

marcelocenerine Dec 19, 2017 •

edited

Loading