You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, islowercase checks whether a character is in category Ll, Letter: Lowercase, and isuppercase checks for category Lu, Letter: Uppercase or Lt, Letter: Titlecase.
However, it was recently brought to my attention that there are actually official Unicode derived properties called Lowercase and Uppercase which differ from these definitions.
Titlecase characters like Dž (U+01c5) are not considered uppercase. (Note that uppercase('Dž') yields a different character 'DŽ', so this makes a certain sense.)
Some Lo, Letter: Other characters like ª are included as Lowercase (or Uppercase in other cases like Ⓐ).
The next version of utf8proc will provide islower and isupper functions compliant with these definitions (JuliaStrings/utf8proc#196), so we may want to switch to them.
(My guess is that it makes little difference in practice — I'm not clear how useful these functions are for general Unicode strings — but the standard here seems fairly sensible. Apparently this is what Python's isupper/islower functions do.)
The text was updated successfully, but these errors were encountered:
Currently,
islowercase
checks whether a character is in category Ll, Letter: Lowercase, andisuppercase
checks for category Lu, Letter: Uppercase or Lt, Letter: Titlecase.However, it was recently brought to my attention that there are actually official Unicode derived properties called Lowercase and Uppercase which differ from these definitions.
Dž
(U+01c5) are not considered uppercase. (Note thatuppercase('Dž')
yields a different character'DŽ'
, so this makes a certain sense.)ª
are included as Lowercase (or Uppercase in other cases likeⒶ
).The next version of utf8proc will provide
islower
andisupper
functions compliant with these definitions (JuliaStrings/utf8proc#196), so we may want to switch to them.(My guess is that it makes little difference in practice — I'm not clear how useful these functions are for general Unicode strings — but the standard here seems fairly sensible. Apparently this is what Python's isupper/islower functions do.)
The text was updated successfully, but these errors were encountered: