-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number formatting for many languages and sorting (collation) with ICU #1632
Conversation
Mentioned Roman (title case) but actually needed ROMAN (upper case)
Update deprecated calls. Remove display formats that were undocumented and mixing the concepts of numbering system and format style. Slighty breaking possibly, but if this was supposed to be a real feature, a syntax extension to counters would be better... Note that example in arabic-indic was using decimal ١٬٩٨٤ and is now a default number ١٩٨٤. Such a huge value is unlikely to be as common case for counters... But this seems more consistent e.g. with what CSS "list-style: arabic-indic;" also does, so likely better.
05bbdab
to
519a46f
Compare
e2d70a5
to
ea7446d
Compare
Cardinal vs. Ordinal is also very applicable in Turkish, and to a lesser degree English. It's probably a distinction we should make and supply both for rather than assuming one or the other given the most common use case. Otherwise 💯 to the analysis here. |
I'm looking into dropping the bits of our implementations that are now duplicated. Somewhat amusingly our "string" (spellout) implementation in English handles bigger numbers than the ICU one which starts failing at 32 bits. It leaves off at quadrillions, ours goes on to handle quintillion, sextillion, septillion, and octillion! Similarly for the Turkish implementation, ours is more complete. The English apostrophe thing is probably wrong according to most style guides though. |
Yep, or even better (with tunable settings, in the case of Esperanto). I'm ok with SILE having a mechanism for bypassing ICU when it can do better, I didn't intend to have them necessarily replace. The main point is to address other languages generically with decent defaults.1 Footnotes
|
* TR output should be identical for small numbers, better with thousands separators for large numbers. * EN output is always different but arguably better. We had invalid apostrophes after numbers (correct in ICU) and also didn't have thousands separators.
Rationale
First part (number formatting) = Closes #1630
Notes
tests/counters.sil
is a "reference" of sorts, though fully undocumented. I would tend to think it is acceptable (for the main intended use of counters, i.e. sectioning and lists). It's unlikely to affect many users anyway; and the benefits are worth it: We get "default" (cardinal), "decimal" (with thousands separator, etc.), "ordinal" (our former "nth"), as well as "string" (spelt-out) support, for most languages (i.e. as long as supported by ICU).N.B. I haven't considered all format style options from ICU (e.g. "duration", "currency", etc.) because I felt they were unlikely to be needed. That could be added, if someone really wants - But I'm not convinced it would have a real use in SILE (and in my tests at least, the ICU library is somewhat inconsistent in how it honors these).
Second part (string sorting) = Add language-dependent sorting with ICU collation options
As noted, an imperative for indexes (= relates to #1339), etc.
--> Alinoé, Alinéa, Jean100, Jean2 = bad indexing order
--> Alinéa, Alinoé, Jean2, Jean100 = good indexing order, yay!
Supports options passed to
SU.collatedSort
, or also defined inSU.collatedSort.xx
for language-specific override of the default values if need be.