This repository provides covering grammars for English and Russian text normalization as documented in:
Gorman, K., and Sproat, R. 2016. Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4: 507-519.
Ng, A. H., Gorman, K., and Sproat, R. 2017. Minimally supervised written-to-spoken text normalization. In ASRU, pages 665-670.
If you use these grammars in a publication, we would appreciate if you cite these works.
The grammars are written in Thrax and compile into OpenFst FAR (FstARchive) files. To compile, simply run make
in the src/
directory.
See LICENSE
.
This is not an official Google product.