Skip to content
/ g2ps Public

Data and code for grapheme-to-phoneme transducers in lots of languages

License

Notifications You must be signed in to change notification settings

uiuc-sst/g2ps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LanguageNet Grapheme-to-Phoneme Transducers

Usage

  • Download and install Phonetisaurus

  • Get your own copy of the G2Ps: git clone https://github.com/uiuc-sst/g2ps

  • Test the installation: phonetisaurus-g2pfst --model=g2ps/models/akan.fst --word=ahyiakwa You should see the answer "ahyiakwa 21.7336 a ç i a ɥ a˥".

Description

  • The column "FSTs" is a trained grapheme-to-phoneme transducer for use with phonetisaurus. If the available lexicons were large enough to test the phone error rate (PER), then it is listed in parentheses. As of this writing, PERs range from 7% to 45%. Note: some of the trained models exceed github's file size limit, so they're not available on the github page; instead, you can find them at http://speechtechnology.web.illinois.edu/data/g2ps/ Currently those are (american-english, arabic, dutch, french, german, portuguese, russian, spanish, turkish).

  • The column "Pronlexes" lists pronunciation lexicons distributed on this site; most are just short symbol tables, but a few are longer.

  • Other columns are just pointers to sources.

Acknowledgments

This project was funded from 2016-2019 as part of the LanguageNet. Phonetisaurus G2Ps were trained using the lexicons listed here, and the lexicons in the LanguageNet. Some languages had other sources: Appen BABEL lexicons (amharic, assamese, bengali, cebuano, georgian, guarani, haitian, igbo, javanese, kurdish, lao, lithuanian, luo, mongolian, pushto, swahili, tagalog, tamil, tok-pisin, turkish, vietnamese, yue, zulu), CELEX (dutch, english, german), CALLHOME (egyptian-arabic, mandarin, spanish).

About

Data and code for grapheme-to-phoneme transducers in lots of languages

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages