HIPPI stands for Highly Accurate Protein Family Classification with Ensembles of HMMs
, and so is a method for the following problem:
Protein family identification:
- Input: A query sequence
q
and a set of alignments and trees on a protein familyF
- Output: A score of
q
against all familiesF
HIPPI is a modification of UPP for scoring protein sequences against a protein family database. HIPPI operates in two steps. In the first step, HIPPI builds an ensemble of HMMs on an input protein family. In the next step, HIPPI scores the query sequences against all the ensemble of HMMs for the protein family. By pipelining these step for all protein families, one can obtain the score of the query sequences against all families. HIPPI is in active development, and scripts will soon be made available to pipeline these steps into a single command.
Developers: Nam Nguyen, Michael Nute, Siavash Mirarab, and Tandy Warnow.
###Data:
Nguyen, Nam-phuong (2016): HIPPI Dataset
. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6795126_V1
###Publication:
Nam Nguyen, Michael Nute, Siavash Mirarab, and Tandy Warnow. HIPPI: Highly Accurate Protein Family Classification with Ensembles of profile Hidden Markov Models
. BMC Genomics 17, 765 (2016). https://doi.org/10.1186/s12864-016-3097-0
- HIPPI bundles the following program into its distribution:
- hmmer: http://hmmer.janelia.org/
- HIPPI uses the Dendropy package.
- HIPPI uses some code from SATe.
This section details steps for installing and running HIPPI. We have run HIPPI on Linux and MAC. If you experience difficulty installing or running the software, please contact Nam Nguyen or Siavash Mirarab.
Before installing the software you need to make sure the following programs are installed on your machine.
- Python: Version > 2.7.
- SEPP: Version > 3.0.
HIPPI is a part of the SEPP distribution package. By installing SEPP, HIPPI is automatically installed (see [SEPP readme] (https://github.com/smirarab/sepp/blob/master/README.SEPP.md)).
- HIPPI requires SEPP to be installed. If HIPPI is not running, first check to see if SEPP was installed correctly.
To run HIPPI, invoke the run_ensembles.py
script from the bin
sub-directory of the location in which you installed the Python packages. To see options for running the script, use the command:
python <bin>/run_ensembles.py -h
The general command for running HIPPI is:
run_ensembles.py -a input_alignment -t input_tree -f input_query_sequences -A decomp_size - m amino -D 0.60
where decomp_size is 10% of the original input alignment. This will run HIPPI(10%,40%) as described in the main paper.
The main output of HIPPI output_scores.csv. This lists the score of the query sequences against all the ensemble of HMMs.
We are currently building a pipeline script to streamline this process.
HIPPI is under active research development at UIUC by the Warnow Lab (and especially with her former PhD students Siavash Mirarab and Nam Nguyen). Please report any errors or requests to Michael Nute (mike.nute@gmail.com), Siavash Mirarab (smirarab@gmail.com), or Nam Nguyen (nnguyen@boundlessbio.com).