Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ISO 639-3 (3-letter) language codes in the database #73

Open
MattBlissett opened this issue Oct 5, 2018 · 2 comments
Open

Use ISO 639-3 (3-letter) language codes in the database #73

MattBlissett opened this issue Oct 5, 2018 · 2 comments

Comments

@MattBlissett
Copy link
Member

We have checklists containing names with less-spoken languages, which only have ISO 639-3 3-letter language codes.

Our API exposes 3-letter codes, but languages are parsed to two-letter codes and stored in the database as two-letter codes.

We should change to use three-letter codes throughout (though still accepting 2-letter codes, of course).

This checklist has many vernacular names with less-spoken three-letter languages: https://www.gbif.org/species/search?dataset_key=a0b06e2e-287a-4687-8a6c-2c0cfb31c16d&origin=SOURCE&issue=VERNACULAR_NAME_INVALID&advanced=1

@mdoering
Copy link
Member

mdoering commented Oct 8, 2018

The problem boils down to our Language enumeration which only tracks 2 letter codes:
https://github.com/gbif/gbif-api/blob/master/src/main/java/org/gbif/api/vocabulary/Language.java#L36

The API does not use strings but this enumeration

@mdoering
Copy link
Member

mdoering commented Oct 8, 2018

changing the db would not be a big thing, but pimping the enum and the LanguageParser a bit more.
Wikipedia claims there are currently 7776 3 letter codes. Do we want to manage them in an enumeration still?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants