-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intl.getParentLocales #87
Comments
Oh, and since Abstract Locale Operations are already in Stage 2, I expect to be able to present it and get in Stage 2 for the next TC39 meeting and ask for reviewers there. |
The way we would like to tackle it is that the algo will produce a list of locales from the most specific, to the most generic fallback.
On top of that, the algorithm will scan through an implementation specific exception list to product a different fallback list for certain locales:
And then we'll use the resulting abstract operation for public API and internally for all formatters. @rxaviers - will this work for you? |
Also, I assume that this is the exception list we'll want to use: I also see this: and I'm not sure what to do with it :) |
Is the app spot data actually updated? I would use the XML or JSON from the official CLDR release, http://cldr.unicode.org |
Not sure if it's updated, here are XML bits:
Still not sure which one is the one we should use in this API and that solves https://github.com/rxaviers/ecma402-fix-lookup-matcher |
Ok, so it seems that likelySubtag is more of what we're looking for. It does give us the ability to take Now, the question is, what should we do from there? Should it be:
If not the latter, how do we recognize where to stop? One way would be to write the algo that cuts parts until it matches with a part that is already in exception list (in this case Would that work? |
Hey @zbraniecki, sorry for the long delay on my answer. The issues pointed out (and addressed) by the proposal https://github.com/rxaviers/ecma402-fix-lookup-matcher are a little trickier than the ones you exemplified above (and that are exemplified by Abstract Locale Operations - Nov 2015). Please, correct me if I'm wrong, but I understood the decision was to expose I believe https://github.com/rxaviers/ecma402-fix-lookup-matcher is the "the spec will take into consideration the proper algo step" part, specifically this implementation details. Answering to your last question "how do we recognize where to stop?". It should stop at A very basic step of the algorithm for finding the parent locale is to truncate it, e.g., the parent locale of Another very basic step of the algorithm for finding the parent locale is to preferably use the parent locale data instead of truncating it, e.g., the parent locale of Code: https://github.com/rxaviers/cldrjs/blob/master/src/bundle/parent_lookup.js#L7-L26 The trickier part is that the above steps work (and only work) if you start at the right place (the right locale / the right bundle). For example, starting the parents locale chain of Another important point is that "where to start" depends on your available data. For example, as of today, CLDR has the following bundles ('where to start's) for Chinese: CLDR/UTS#35 specifies that you should use Language Matching in order to find the 'where to start's by "The table Lookup Differences uses the naïve resource bundle lookup for illustration. More sophisticated systems will get far better results for resource bundle lookup if they use the algorithm described in Section 4.4 Language Matching. That algorithm takes into account both the user’s desired locale(s) and the application’s supported locales, in order to get the best match". As you may have noticed, Language Matching is an algorithm that works for more stuff than simply a Lookup Matcher, it is a Best Fit Matcher, which is more generic than Lookup Matcher given you can give weights for the desired languages and so on. My proposal algorithm (the one linked above) is derived from it, but specifically for Lookup Matching, so it's simpler and requires no extra data than LikelySubtags. I suggest we use that. Just let me know if you have any additional questions or ideas. |
@zbraniecki do you plan to formalize this any time soon? Do you need help? |
If an application has only a "en-GB" locale (and no "en" or "en-US" locale), a "en-US" locale ought to be suitable for presentation to such users. It would therefore seem to me that, in considering (Pardon me if this is jumping the gun...) |
@brettz9 what you're looking for is a language negotiation strategy. Such a strategy may take There are many ways to negotiate languages. One is described in RFC4647 - https://www.ietf.org/rfc/rfc4647.txt - but it is far from being the only one. ECMA402 uses a different one, and, for example, Fluent (a library I am working on) uses yet different, described here: https://github.com/projectfluent/fluent.js/blob/master/fluent-langneg/src/matches.js#L6 The nature of such negotiation depends on your needs and I doubt it can be unified. For example Fluent always aims to negotiate between a list of requested locales against a list of available locales and offers three different strategies for doing so:
RFC4647 and HTTP Accepted Headers recommend a different strategy that includes calculating the proximity and assigning weights. The shared bottom line is that there's a trap in each way of thinking about negotiating BCP47 tags and that's the naive approach of "just cut at each Unfortunately, that thinking is wrong for a number of locales. Examples such as The whole list of likely subtags is here: https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/likelySubtags.json My current thinking is that instead of this API we really want That also matches what ICU is exposing - http://www.icu-project.org/apiref/icu4c/uloc_8h.html#a0cb2dcd65f745e7a966a729395499770 I haven't had time to formalize it, but based on my work on |
While you make a good point, @zbraniecki , that there are a number of possible strategies that could be taken, a lack of suitability for requiring a single high level strategy does not obviate the desirability for having more comprehensive lower-level options. So yes, while, I am interested in building a language negotiation strategy, I'm really looking for fundamentals that can help developers compose our own complete strategy without need for dragging a lot of supplementary data into our apps. I think we need more than
But these techniques don't cover the use case raised by @rxaviers at #87 (comment) for determining, for example that "en-GB" matches more closely to "en-001" than "en-US" or "en" (Technically, I think "world English" may in some cases be less helpful for "en-GB" readers than "en-US" as I would think many used to British English would prefer having U.S. English over an overly simplified international English, but if the nebulous "world English" concept is taken instead as meaning merely avoiding country-specific regionalisms, then it could indeed be useful to fall back to the more generic "en-001" for "en-GB" readers as appears to be the intent of this hierarchy).
Another consideration in all of this is that locale APIs would need not require specifying available locales ahead of time--an implementation could, for example, lazily check for the existence of an "en-001" file if "en-GB" was not found and an "en" file if that was not found (though it would admittedly probably be generally more optimal to require specification of the available locales). This lazy checking could have some appeal though when working with simple client-side apps whereby one doesn't wish to go through the process of adding a build step to specify the available locales (nor track the available locales manually) but where one might be caching the result anyways. As far as my suggestion for So my personal preference would be to see |
During TC39 meeting yesterday we decided to separate out the
Intl.getParentLocales
out of #46.Basically, we already have
Intl.getCanonicalLocales
in the spec, and the next step to help with language negotiation is to exposeIntl.getParentLocales
.This should also fix the concerns raised by @rxaviers in https://github.com/rxaviers/ecma402-fix-lookup-matcher as we should fix the internal operation and expose it.
I'll prototype the polyfill and spec soon.
The text was updated successfully, but these errors were encountered: