Skip to content

A list of initiatives for adding new languages to opensource machine translation models

License

Notifications You must be signed in to change notification settings

slone-nlp/awesome-new-languages-in-machine-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Awesome New Languages in Machine Translation

This is a list of initiatives for adding new languages to opensource machine translation models (such as NLLB).

Also, some notable projects for increasing the translation quality for an already supported low-resourced language would be highlighted.

The first part of the document lists individual languages in the alphabetic order of their English names.

The second part of the document lists multilingual initiatives.

Any new additions are welcome (in the form of pull requests or issues)!

Single-language projects

Ainu

Amis

Aromanian

Awajun

Bambara

Buryat

Circassian (Kabardian)

Erzya

Additionally, see TartuNLP.

Fula

Hill Mari

See TartuNLP

Interslavic

Karakalpak

Komi

See TartuNLP

Livonian

See TartuNLP

Livvi Karelian

See TartuNLP

Mansi

Mari (Meadow)

See TartuNLP

Moksha

See TartuNLP

Ngambay

Qarachay Malqar

Tyvan

Udmurt

See TartuNLP

Zarma

Multilingual projects

Finno-Ugric languages (tartuNLP)

Multiple Finno-Ugric languages (including Komi, Udmurt, Hill and Meadow Mari, Erzya, Livonian, Mansi, Moksha and Livvi Karelian)

Indigenous languages of the Americas (AmericasNLP Shared Tasks)

Indigenous languages of the Americas (including Ashaninka, Aymara, Bribri, Chatino, Guarani, Hñähñu, Nahuatl, Quechua, Raramuri, Shipibo-Konibo, and Wixarika from the AmericasNLP Mt shared task, and Wayuunaiki, Arhuaco, Inga, and Nasa – additionally)

Hundreds of diverse languages (Apertium)

Apertium is a system of rule-based machine translation.

Currently, it has linguistic tools (such as dictionaries and morphological parsers) for an insane number of languages, but only few of them (51 language pairs) have been developed to a state considered stable enough for publicly releasing a translation service.

About

A list of initiatives for adding new languages to opensource machine translation models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published