Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

nzubair · 2014-10-22T02:52:50Z

This PR addresses behavior observed in issues #318.

Primarily, handling of underscores and dashes which do not connect words. They either have a space before or after them, or both, as was the case for this bug.

Additionally, the regex in FromPascalCase was not handling spaces in a string properly. Amended the regex to handle the spaces. It can be improved, given my lack of mastery of .Net regular expression.

Finally, a few test cases were added to test different combinations of spaces, dashes and underscores.

both

The regex can be refined to exclude leading and trailing spaces from the results of pascalCaseWordBoundary.Split(). For now, word.Trim() is added to discard the unneeded space.

mexx · 2014-10-22T05:02:20Z

src/Humanizer/StringHumanizeExtensions.cs

-                    word.ToCharArray().All(Char.IsUpper) && word.Length > 1
-                        ? word
-                        : word.ToLower())
+                    word.Trim().ToCharArray().All(Char.IsUpper) && word.Length > 1


In consideration of performance, can you do the Trim() once.

I know you didn't introduced it, but can you also remove ToCharArray(), it should work without it anyway.

mexx · 2014-10-22T05:11:56Z

Thanks for the PR. Please add it to the release notes as mentioned in CONTRIBUTING.md.

It would be cool if you would address the comments I've made.

nzubair · 2014-10-22T12:34:41Z

Thanks for the feedback Max. I will address your comments today.

…gregate()

…ore' instead of 'hyphen'

nzubair · 2014-10-23T00:28:57Z

@mexx As requested, reduced the number of Trim()s, simplified the regex and updated the release_notes.md.

However, I was not able to remove ToCharArray(). MSBuild spat out the following error:

StringHumanizeExtensions.cs(35,21): error CS1061: 'string' does not contain a definition for 'All' and no extension method ' All' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference? ) [D:\Work\Dev\Projects\nzh\Src\Humanizer\Humanizer.csproj]

MehdiK · 2014-10-26T05:39:22Z

Thanks for the contribution @nzubair.

I think dash shouldn't be removed as part of humanization. For example "left-handed people".Humanize() should still return "left-handed people" (I understand there is no real rule around this and it could be quite subjective but I think leaving it in makes more sense). The reason underscore is being removed is that underscore normally doesn't make sense in a human readable sentence and removing it normally results into a more human readable sentence. This behavior is highlighted by the name of the covering test (CanHumanizeStringWithUnderscores). You've added a few test cases to that test that are checking for dashes which don't fit there, firstly because of the method name and more importantly because I don't think we should remove dashes.

As per #318, as I mentioned in the comments, "Humanize should leave the text alone (because as far as it can see it's already human-readable)" regardless of how many times it's called on it.

mexx · 2014-10-26T12:31:28Z

@MehdiK We already remove dashes, look at the implementation.
So maybe the problem is with that wrong handling?

nzubair · 2014-10-26T13:15:40Z

Hi @MehdiK, Thank you for taking time to review the PR and for providing feedback.

However, I'm a bit confused as what you are describing contradicts the behavior of Humanizer at present.

Dashes
In the current implementation, dashes and underscores are removed regardless of where they are and how they are connecting words. "left-handed people" will turn into "left handed people". For #318, TEST 1 - THIS IS A TEST turns into TEST 1[space][space][space]THIS IS A TEST. This is happening at L54 (check) and L14(replace). It's a simple substitution without logic.

Spaces (and impact on all uppercase strings)
Additionally, the FromPascalCase does not handle spaces. So handling of all caps strings is incorrect. The check for all caps at L51 only works if the string is a single word, otherwise it returns false. Once that check fails, L57 calls FromPascalCase on all caps string. The regular expression does not deal with spaces. Therefore, .Split() produces a single result, with the entire string, which is then title cased and returned. This produces Test 1 This is a test from TEST 1 THIS IS A TEST.

Can you please let me know if my interpretation of the expected behavior is incorrect?

Thanks again for your time.

MehdiK · 2014-10-29T00:44:22Z

Thanks @mexx and @nzubair for the correction. You're right. That is the current behaviour! I will check this out again later today (and merge).

P.S. That method was the very first method implemented in the library and hasn't been touched (apart from a RegEx rewrite) since!

MehdiK · 2014-10-31T22:11:50Z

This is now released to NuGet as v1.30.0. Thanks for the great contribution.

nzubair added 4 commits October 21, 2014 22:08

added test to find dash/hyphen with preceeding or following spaces or

af8b040

both

updated regex to account for words/acronyms/numbers separated by spaces.

19d9a09

The regex can be refined to exclude leading and trailing spaces from the results of pascalCaseWordBoundary.Split(). For now, word.Trim() is added to discard the unneeded space.

added various test cases for space, dash and hyphen handling.

42d6ad7

Removed unneccessary .Trim() from word.Length line

6469e2d

mexx reviewed Oct 22, 2014
View reviewed changes

nzubair added 3 commits October 22, 2014 20:09

Removed multiple Trim()s within Select(), added a single Trim() in Ag…

e1f1d8a

…gregate()

Simplified regex by removing [\s]{0}. Updated comment to say 'undersc…

ca6e447

…ore' instead of 'hyphen'

Updated release_notes.md with PR#341 information.

0e6edfe

MehdiK merged commit 0e6edfe into Humanizr:master Oct 29, 2014

MehdiK mentioned this pull request Oct 29, 2014

Have to Humanize a string twice #318

Closed

nzubair deleted the StringExtensions branch October 30, 2014 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

nzubair commented Oct 22, 2014

mexx Oct 22, 2014

mexx commented Oct 22, 2014

nzubair commented Oct 22, 2014

nzubair commented Oct 23, 2014

MehdiK commented Oct 26, 2014

mexx commented Oct 26, 2014

nzubair commented Oct 26, 2014

MehdiK commented Oct 29, 2014

MehdiK commented Oct 31, 2014

Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

Conversation

nzubair commented Oct 22, 2014

mexx Oct 22, 2014

Choose a reason for hiding this comment

mexx commented Oct 22, 2014

nzubair commented Oct 22, 2014

nzubair commented Oct 23, 2014

MehdiK commented Oct 26, 2014

mexx commented Oct 26, 2014

nzubair commented Oct 26, 2014

MehdiK commented Oct 29, 2014

MehdiK commented Oct 31, 2014