-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(text): unicode support and word splitting according to case #5447
fix(text): unicode support and word splitting according to case #5447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Just a few nits on the first pass. Thank you.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5447 +/- ##
==========================================
- Coverage 96.30% 96.29% -0.01%
==========================================
Files 465 465
Lines 37705 37711 +6
Branches 5561 5560 -1
==========================================
+ Hits 36312 36315 +3
- Misses 1351 1354 +3
Partials 42 42 ☔ View full report in Codecov by Sentry. |
By this change, the result of |
Looking at a more semantic example like |
I agree with this. In that case, probably we should return |
…urn an iterator from splitToWords()
…s/deno_std into unicode-and-case-wordSplit
I made some changes to better handle Acronyms followed by a capitalized word, like in I think an added benefit of doing everything with one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…s/deno_std into unicode-and-case-wordSplit
This looks a great improvement to me. Nice! |
dfd42c7
to
90c7286
Compare
The current implementation of splitToWords() only supports Latin letters. This PR adds a unicode-compatible implementation + support for WordSpliting according to case.