Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javanese language code - jw #21

Open
jordimas opened this issue Sep 17, 2024 · 2 comments
Open

Javanese language code - jw #21

jordimas opened this issue Sep 17, 2024 · 2 comments

Comments

@jordimas
Copy link

Hello.

It may be worth considering adding "jw" as alias for Javanese

I found this in OpenIA Whisper code:

https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L108

I will expect "jv"

I found documented here: https://xml.coverpages.org/iso639a.html

Javanese is rendered as "jw" in table 1, while it is correctly
given as "jv" in the other tables.

It seems that may be an error that propaged. I have not done an extensive research, I just sharing what I found.

Thanks

@LBeaudoux
Copy link
Owner

Hi Jordi,

Thanks for reporting this issue. According to the ISO 639-2/RA Change Notice, the 'jw' identifier was indeed published in error and then deprecated in August 2001.

iso639-lang already detects deprecated ISO 639-3 identifiers. After the next update it will also detect deprecated ISO 639-3 reference names. Following your report, I will try to make it detect deprecated values from ISO 639-1, ISO 639-2 and ISO 639-5 as well.

@jordimas
Copy link
Author

Thanks, great library BTW!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants