geomhmm
: Geometric Hidden Markov Models
This is an implementation of a learner for the generalized Hidden Markov Models, where the observed random variables are manifold-valued (we call such model a geometric hidden Markov model).
Testing the code in your local environment is as simple as:
- Clone this repository.
- Navigate to the corresponding directory in the terminal.
- Run
pip install -r requirements.txt
to install all dependencies.1 - Run
python func_tests.py
to do a smoke test of the code.
Of course, you may want to do the above in a virtual environment (see here or here if you're not familiar with Python's virtual environment). We developed the code in Python 3.8.12, so we recommend running the code in a similar Python version.
Here is an example on how to use the code:
from geomhmm import PoincareDiskGaussianHMM
from exp import gen_chain_Tupker2021
_, y, _ = gen_chain_Tupker2021() # Generate an example Poincare-disk-valued HMM
m = PoincareDiskGaussianHMM(S=3, max_lag=3, num_samples_K=500) # Initialize the learner
m.partial_fit(y) # Learning step
print(m.B_params, m.pi_inf_hat, m.P_hat) # Print the current estimates
- The file
geomhmm.py
is the meat of the code and contains the implementation of the learner itself. Currently we haveEuclideanGaussianHMM
,PoincareDiskGaussianHMM
, andSPDGaussianHMM
, which are learners for HMMs with observed values in Euclidean space / Poincare Disk / SPD matrices, respectively. The mixture estimation uses the approach outlined in Zanini et al., 2017, and the estimation for the transition matrix uses the method of moments algorithm which we adopted from Mattila et al., 2020.- In addition, the file
extensions.py
contains the variants of the learner, such asSPD_EM_GaussianHMM
, which uses the expectation-maximization (EM) algorithm for mixture estimation in place of the Zanini et al., 2017 approach.2 To run the code for most of the learners in this file (includingSPD_EM_GaussianHMM
), one will need a valid Matlab license in order to use the Matlab engine for Python (see here for more details). - More documentations on how to initialize/use these learners to come.
- In addition, the file
- The files
randSPDGauss.py
/randPoincGauss.py
contain the code to sample a Gaussian distribution of SPD manifolds / Poincare Disk. - The file
func_test.py
contains a suite of examples used to test the implementation. - The file
exp.py
runs experiments that replicate the set-up used by Salem et al., 2021 and by Tupker et al., 2021.- For example, you can run
python exp.py --mode Salem2021 --oname output --opath ./out --seed=202 --givenTrue False
to replicate the set-up used by Salem et al., 2021 (thegivenTrue
flag controls whether the we want the learner to learn the emission probabilities/stationary distribution as well;False
means we do want to learn those variables). - The directory
exp_config_templates
contains example config files used to run some of the experiments, such as hyperparameter tuning (hyp_tuning
). More documentation about how to run each experiment to come.
- For example, you can run
The algorithm was developed by Berlin Chen, Dr. Cyrus Mostajeran, and Dr. Salem Said. The development for the implementation is still ongoing. Any feedback is appreciated.
Footnotes
-
The file
requirements-dev.txt
contains packages that are strictly not needed for the learners, but are useful if one were to further develop/experiment with our implementation. To install these packages, simply runpip install -r requirements-dev.txt
. ↩ -
The Matlab code for EM estimation and Riemannian gradient descent is currently proprietary and not publicly available. We are working to release a publicly available version. ↩