Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full support for traditional and simplified Chinese labels #1955

Open
peitili opened this issue Jun 14, 2021 · 2 comments
Open

Add full support for traditional and simplified Chinese labels #1955

peitili opened this issue Jun 14, 2021 · 2 comments
Assignees

Comments

@peitili
Copy link
Contributor

peitili commented Jun 14, 2021

Tilezen remaps localized names into 2-char languages codes from OpenStreetMap, Natural Earth, and OpenStreetMap – which each has their own way of representing name localizations.

In the case of Chinese (and possibly other languages), this "spoken" language has multiple "written" character sets (Traditional and Simplified) and is spoken and written in multiple countries using different configs.

But in Tilezen we only export a generic and ambiguous name:zh value. In UX design generally it's best practice to target each language as a combination of language + country code to allow for local colloquialisms. But for mapping sometimes less is better / mostly we're dealing in proper nouns - so another alternative is to say zh-hans (Chinese simplified irresepctive of country) and zh-hant (Chinese traditiional irrespective of country). Let's pick one and stick with it, and make it work with the point-of-view / worldview being introduced in v5.

For example:

Locale Description
zh-CN Chinese (Simplified, PRC)
zh-SG Chinese (Simplified, Singapore)
zh-TW Chinese (Traditional, Taiwan)
zh-HK Chinese (Traditional, Hong Kong S.A.R.)

def _convert_wof_l10n_name(x):
lang_str_iso_639_3 = x[:3]
if len(lang_str_iso_639_3) != 3:
return None
try:
lang = pycountry.languages.get(alpha_3=lang_str_iso_639_3)
except KeyError:
return None
return LangResult(code=_alpha_2_code_of(lang), priority=0)
def _convert_ne_l10n_name(x):
if len(x) != 2:
return None
try:
lang = pycountry.languages.get(alpha_2=x)
except KeyError:
return None
return LangResult(code=_alpha_2_code_of(lang), priority=0)
def _normalize_osm_lang_code(x):
# first try an alpha-2 code
try:
lang = pycountry.languages.get(alpha_2=x)
except KeyError:
# next, try an alpha-3 code
try:
lang = pycountry.languages.get(alpha_3=x)
except KeyError:
# finally, try a "bibliographic" code
try:
lang = pycountry.languages.get(bibliographic=x)
except KeyError:
return None
return _alpha_2_code_of(lang)

This was referenced Jun 15, 2021
@nvkelso nvkelso changed the title Add Chinese language parser Add full support for traditional and simplified Chinese labels Jun 17, 2021
@nvkelso
Copy link
Member

nvkelso commented Jul 20, 2021

I think we take country names from OSM, so not clear why these are in the import statement:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants