RepeatHMM: estimation of repeat counts on microsatellites from long-read sequencing data

RepeatHMM is a novel computational tool to detect any microsatellites (including trinucleotide repeats in trinucleotide repeat disorders (TRD)) from given long reads for a subject of interests. It is able to accurately estimate estimate expansion counts according to the evaluation performance on both simulation data and real data. It is user friendly and easy to install and use.

Features

Accurate and efficient estimation of repeat counts from long-read sequencing data
Analysis of all types of simple repeats
Prefined models are included for more than 10 well known trinucleotide repeats: AFF2, AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, ATXN8OS, CACNA1A, DMPK, FMR1, FXN, HTT, PPP2R2B, TBP
Easy to install and use

Methodology of RepeatHMM

RepeatHMM takes a set of reads as input, uses a split-and-align strategy to improve alignments, performs error correction, and leverages a hidden Markov model (HMM) and a peak calling algorithm based on Gaussian mixture model to infer repeat counts. RepeatHMM allows users to specify error parameters of the sequencing experiments, thus automatically producing transition and emission matrices for HMM and allowing the analysis of both PacBio and Oxford Nanopore data.

RepeatHMM was evaluated on both random simulation and PCR-based simulation for long reads containing CAG repeats, and also on real datasets of ATXN3 for SCA3 of ATXN10 for SCA10. The results demonstrated that our tool was able to accurately estimate expansion counts from long reads.

Inputs of RepeatHMM

RepeatHMM takes long reads from a subject as input, and can also take a BAM file as input to find more than 10 predefined trinucleotide repeats or a gene given by users, after all reads were well aligned to a reference genome.

Usage

Please refer to Usage for how to use RepeatHMM.

Revision History

For release history, please visit here. For details, please go here.

Contact

If you have any questions/issues/bugs, please post them on GitHub. They would also be helpful to other users.

Reference

Please cite the publication below if you use our tool:

Qian Liu, Peng Zhang, Depeng Wang, Weihong Gu and Kai Wang. Interrogating the "unsequenceable" genomic trinucleotide repeat disorders by long-read sequencing. Genome Med. 9(1):65, 2017. doi: 10.1186/s13073-017-0456-7.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
bin		bin
docs		docs
INSTALL		INSTALL
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepeatHMM: estimation of repeat counts on microsatellites from long-read sequencing data

Features

Methodology of RepeatHMM

Inputs of RepeatHMM

Usage

Revision History

Contact

Reference

About

Releases

Packages

Languages

License

liuqianhn/RepeatHMM

Folders and files

Latest commit

History

Repository files navigation

RepeatHMM: estimation of repeat counts on microsatellites from long-read sequencing data

Features

Methodology of RepeatHMM

Inputs of RepeatHMM

Usage

Revision History

Contact

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages