-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-17382 languagematch Ukrainian should not fall back to Russian #3993
Conversation
I didn't think we were changing the match to English, I think it should match what was done for Macedonian. I thought the goal was to remove the language match and not to explicitly cause it to match to English. |
Before making the change, I looked at some of the other language matches and previous changes, and it seemed that a change, not deletion, was the right thing to do in this case. |
common/supplemental/languageInfo.xml
Outdated
<languageMatch desired="uk" supported="ru" distance="20" oneway="true"/> <!-- Ukrainian ⇒ Russian --> | ||
|
||
<languageMatch desired="uk" supported="en" distance="30" oneway="true"/> <!-- Ukrainian ⇒ English --> | ||
<!-- CLDR-17382: languageMatch: Ukrainian should not fall back to Russian --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case then can we have the comment say that Ukrainian should match with English if we are adding an explicit match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to have the explicit fallback to en.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree very much. That's not how this is designed, nor advertised.
common/supplemental/languageInfo.xml
Outdated
<languageMatch desired="uk" supported="ru" distance="20" oneway="true"/> <!-- Ukrainian ⇒ Russian --> | ||
|
||
<languageMatch desired="uk" supported="en" distance="30" oneway="true"/> <!-- Ukrainian ⇒ English --> | ||
<!-- CLDR-17382: languageMatch: Ukrainian should not fall back to Russian --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to have the explicit fallback to en.
common/supplemental/languageInfo.xml
Outdated
<languageMatch desired="uk" supported="ru" distance="20" oneway="true"/> <!-- Ukrainian ⇒ Russian --> | ||
|
||
<languageMatch desired="uk" supported="en" distance="30" oneway="true"/> <!-- Ukrainian ⇒ English --> | ||
<!-- CLDR-17382: languageMatch: Ukrainian should not fall back to Russian --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree very much. That's not how this is designed, nor advertised.
I think we can wait til tomorrow to see what Markus says, rather than Resolve the Conversation now. |
The information we got is "Ukrainian-language users don't want to be matched with Russian-language contents". The corresponding data change is to remove (comment out) the languageMatch for this pair. This is language matcher data which feeds into an implementation (e.g., ICU LocaleMatcher) which a caller sets up with a list of supported languages plus an optional default language. When there is no match, then the matcher returns the default language, when it's set, otherwise with a "no match" result. The default language is chosen by the caller. It need not be English. And not setting one at all is a valid, important choice. Some callers have special strategies for what to do next. When we overcorrect and force a "fallback" to English, then we short-circuit the functioning of the algorithm and defeat the caller's intent. We should handle this like the other geopolitical cases in the past, like Macedonian. I haven't reviewed other "fallbacks to English" in detail. I assumed that they were generally matches based on some information, like populations are actually somewhat likely to understand English because it's one of the local government and entertainment languages, or remaining influence in colonies, etc. I would expect similar data for one-way matches to French (e.g., Breton --> French), Portuguese, Chinese, and Arabic. Some of these might not make sense. At a glance, I see a one-way match from Esperanto to English; that looks bogus. A review of existing data deserves a separate ticket. |
Upshot I don't think it matters much whether we include an explicit fallback or not, and we don't for many languages. So to be conservative, we could omit the fallback mapping to English, then revisit this in the next cycle. Background I looked at this a bit. If someone uses the default settings with the proposed change, here is what happens.
If the user's desired languages are <Ukrainian, French> and the default language is set to German, then the priority order among the app's supported languages would be: <Ukrainian, French, English, German> So on systems that allow for secondary desired languages, such as iOS, Android, and MacOS, it is easy for users get the the desired result, if their favored fallback language is French (or Russian) rather than German. (Of course, that doesn't necessarily mean that users take advantage of this ability.) The difference would be that English would come before what the system has set as a default. Now, if the system doesn't set a specific default, or doesn't order the supported languages to put a reasonable default as the first one, the results would be a bit random. On the other hand, that is the case for many of our current languages, since we don't always have a fallback for all major languages. I suspect that a very large number of users of LocaleMatcher will use a system default of English; although that might be different in Ukraine. I think that's why we made an effort to have fallback locales for most locales that are not "top tier" (ie, those supported by most applications), so that the user would get some reasonable result for systems that don't set the default based on the likelyhood that users in that country would understand the language. |
@macchiati , I'm not sure if I can discern a tie-breaking call in your reply. Should we default to English or leave it commented out like Markus suggests? |
Yes please let's just comment out the offensive mapping as requested in the ticket.
That last part is important. If it's reasonable to assume that people who understand language x might also understand English or French or... then we should have a medium-high-distance one-way match for that. If not, then we shouldn't have a languageMatch entry. I would argue that Esperanto-->English is the latter. (And I am not asking for changing that in this PR nor under this ticket!) |
Let's comment it out for now, and revisit next cycle. |
I don't see it that way. The goal for the fallbacks should be: based on the best information we have, in the absence of any other information, what are people most likely to understand if the language in question is not available. [Caveat geopolitical] So when when people can't supply secondary languages, is it better to:
|
Changing approach to being open-ended instead of explicit to English. Co-authored-by: Markus Scherer <markus.icu@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'll go ahead and merge |
…nicode-org#3993) Co-authored-by: Markus Scherer <markus.icu@gmail.com>
…3993) Co-authored-by: Markus Scherer <markus.icu@gmail.com>
CLDR-17382
ALLOW_MANY_COMMITS=true