Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(text/unstable): handle non-Latin-script text in slugify #5880

Closed
wants to merge 2 commits into from

Conversation

lionel-rowe
Copy link
Contributor

@lionel-rowe lionel-rowe commented Aug 31, 2024

Fixes #5830

Changed tests:

  • "déjà vu" is now déjà-vu, not deja-vu (and same for other diacritic tests). This was the most common handling of Latin-text diacritics in the sites I checked in text/slugify gives empty results for non-Latin alphabets #5830 (comment).
  • "$33,000" is now 33-000, not 33000. The Unicode category of , is "Punctuation (Other)", which is very broad and IMO should generally replaced with "-" rather than "". I don't think it's probably desirable to special-case individual characters beyond Unicode category as it just ends up with a load of hard-coded and largely arbitrary chars (with the exception of straight quote marks, which are sort-of in the "wrong" category as quote marks typically fall in "Punctuation (initial)" or "Punctuation (final)"). Also, while there isn't much to choose between 33-000 vs 33000 (33000 is more natural, but 33-000 more readable due to retaining the thousands-place delimitation) , is used as a decimal point in many languages, and la-valeur-de-pi-est-3-14 is clearly preferable to la-valeur-de-pi-est-314.

@lionel-rowe lionel-rowe requested a review from kt3k as a code owner August 31, 2024 05:07
@github-actions github-actions bot added the text label Aug 31, 2024
Copy link

codecov bot commented Aug 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.31%. Comparing base (8abff45) to head (2cb9e9b).
Report is 8 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5880   +/-   ##
=======================================
  Coverage   96.30%   96.31%           
=======================================
  Files         483      483           
  Lines       39059    39060    +1     
  Branches     5713     5713           
=======================================
+ Hits        37615    37619    +4     
+ Misses       1399     1396    -3     
  Partials       45       45           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timreichen
Copy link
Contributor

timreichen commented Aug 31, 2024

See #5830 (comment)
I am against this change, because this use case is better handled with @std/text/to-kebab-case

import { toKebabCase } from "@std/text/to-kebab-case";
console.log(toKebabCase("déjà vu")); // "déjà-vu"
console.log(toKebabCase("$33,000")); // "33-000"

slugify() is supposed to match all characters to [a-zA-Z0-9-] and remove all others.

@lionel-rowe lionel-rowe marked this pull request as draft September 1, 2024 03:14
@iuioiua
Copy link
Contributor

iuioiua commented Sep 3, 2024

@lionel-rowe, I think we should first come to an agreement on the solution in #5830 before opening a PR. Are you alright to close this for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

text/slugify gives empty results for non-Latin alphabets
3 participants