GitHub - georgid/htkModelParser: Parses models created by the HTK Toolkit (http://htk.eng.cam.ac.uk/) as text files into Python class. It enables then various operations with the models like visualization and comparison.

georgid / htkModelParser Public

Notifications You must be signed in to change notification settings
Fork 1
Star 6

Parses models created by the HTK Toolkit (http://htk.eng.cam.ac.uk/) as text files into Python class. It enables then various operations with the models like visualization and comparison.

BSD-3-Clause license

6 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
compare		compare
htkparser		htkparser
LICENSE		LICENSE
README		README
USAGE		USAGE
setup.py		setup.py

Repository files navigation

This is a Python script that enables visualization opeartions with the models.
Parses into a python class the HMM monophone models from htk.

Operations:
- comparison of two different models for the same phoneme. by calculating euclidean distance between means of model.
@ file compare.compareModels.py
- visualize features: plots feature vectors can plot two features in the same plot.
@ file compare.visualize.py

The script was developed originally by W.J. Maaskant
(w.j.maaskant(at)student.utwente.nl) for Fraunhofer IAIS, Germany to perform
a comparison of HTK, ISIP Prototype, Julius and HTK.

EDITED by Georgi Dzhambazov
georgi.dzhambazov@upf.edu

See the LICENSE file for the licensing terms.

Requirements
------------
The following Python modules are required:

- PLY
try: pip install ply

This script was developed and tested on openSUSE 10.2.

Capabilities and Limitations
----------------------------
This section contains information about the capabilities and limitations of the
converter. Some limitations are caused by the converter itself, some by
limitations of Sphinx 3.

- It is assumed that the HTK model file is correct. No checks are made on
whether all vectors have the same size, for example.

- The following macro types are supported: ~h ~o ~s ~t ~u ~v. The content of
~o is ignored.

- Feature vectors must have 39 dimensions.

- Only 1 feature stream is supported.

- All models must have the following structure, visualized by a transition
matrix:
| 1 2 3 4 5
-+----------
1| 0 1 0 0 0
2| 0 * * * *
3| 0 * * * *
4| 0 * * * *
5| 0 0 0 0 0

('*' stands for a probability >= 0.)

The number of states may be varied.

The first and last state of the models must be non-emitting.

- All states must have the same number of Gaussian mixtures.

- A monophone model must exist for the base phone of every triphone model.
Monophone models with no triphones with that monophone as base phone are
allowed.

- Biphone models are not supported.

- If more than 1 mixture is used, state definitions must include the
<NUMMIXES> tag. If only 1 mixture is used, state definitions MUST NOT
include the <NUMMIXES> tag.

- All vectors (mean, variance, transition probabilities) must be written as
a floating point value in this (regular expression) format:
-?\d.\d\d\d\d\d\de[+-]\d\d

- Models with base phone "sil" are assumed to be fillers.

Some limitations arise from limitations in Sphinx compared to HTK, and
(most) limitations arise from the fact that the script was created to
accomodate my models. However, it's not difficult to change the rules and
support more features of the HTK model file format.

What you have to do:

- Add or remove the filler attribute to models in the model definitions
file, where necessary.

About

Parses models created by the HTK Toolkit (http://htk.eng.cam.ac.uk/) as text files into Python class. It enables then various operations with the models like visualization and comparison.

Readme

BSD-3-Clause license

Activity

6 stars

3 watching

1 fork

Report repository