Releases: tfwillems/HipSTR
HipSTR v0.7
This release provides a few enhancements to the HipSTR algorithm:
- Added --lib-field option and removed --lib-from-samp
- Automatically trim Illumina adapters (TruSeq and Nextera) from input alignments
- Fixed bug in which AB and FS were automatically set to 0 for homozygous genotypes even if the haplotypes are heterozygous
- Added functionality to output additional haplotype information about STR calls. In addition to the STR sequence, HipSTR can now output the flaking sequence genotypes
- Update usage instruction in README
HipSTR v0.6.2
This release introduces a handful of minor code updates:
- The internal htslib library has been upgradged from v1.5 to v1.8 to improve reliability
- The maximum allowed stutter artifact size was doubled. This results in improved genotyping accuracy with only a slight increase in run time
- A handful of minor changes to the README to clarify usage and suggested filtering options
HipSTR v0.6.1
This release patches a simple bug introduced in v0.6, in which an incorrect absolute value function was used.
HipSTR v0.6
This version of HipSTR contains a handful of minor bug fixes as well as some substantial speed improvements in terms of CRAM IO. Here's a summary of the changes:
- Modified filter_vcf.py so that it doesn't remove alleles if genotype likelihoods are present. Previously, this script would've generated invalid genotype likelihoods if any alleles were removed
- Massively sped up CRAM IO. Analyzing CRAM files should now take 5-30x less IO time due to these changes. This greatly improves overall HipSTR performance for CRAMs, where this was previously the bottleneck. See issue #24
- Fixed a bug in which the stutter model failed to converge, even though its parameters did not change over many iterations. See issue #46
- Modified the genotyping process so that it no longer skips a locus if a very short STR allele is observed. See issue #45
- Added additional descriptions of some of the default filters applied by HipSTR to the README
HipSTR v0.5
The latest version of HipSTR contains a handful of minor bug fixes but mainly focuses on adding functionality that makes the tool more robust and easier to run. Here's a summary of the changes in the latest release:
- Added an FS FORMAT field to the VCF. This field can be used to detect genotyping errors when strand bias is unusually high
- Added --quiet and --silent command line options that can be used to control the level of detail in the log
- Rigorously check that all input files (STR region BED, BAM/CRAMs and SNP VCFs) are consistent in terms of contig names
- Resolved a handful of python3 incompatibilities in the VizAln and VizAlnPDF scripts and they now work with both python2/python3
- All contigs in the input FASTA file are now written to the VCF header. This prevents downstream errors when validating HipSTR's VCFs using Picard/GATK
- Added command line options --max-hap-flanks and --min-flank-freq to control the candidate haplotype sequences that flank the STR that are considered during genotyping
- Automatically filter low frequency flanking haplotype sequences (freq <1%) to improve runtime and reduce memory usage and
- Added a --output-filters command line option that adds a FILTER FORMAT field to the VCF. For samples with missing genotypes, it describes why the sample was skipped, while for other samples it merely contains PASS
HipSTR v0.4
A lot has changed since the last official release of HipSTR (v0.2). We've extensively improved HipSTR's genotyping accuracy, simplified the tool's usage and added new features. Here's a short synopsis of some of what's changed:
- We removed the PhasedBEAGLE component of the tool, as it's no longer relevant
- We removed all dependencies on bamtools and vcflib to simplify compilation
- HipSTR now uses htslib and wrapper files to read BAM and VCF files
- HipSTR now supports alignments in CRAM format in addition to BAM format
- The --use-all-reads option is deprecated, as HipSTR now always uses all reads to boost accuracy
- A new FORMAT field AB is output to the VCF. This field can be used to filter genotypes with highly biased read counts that are likely genotyping errors
- The ---min-mapq, --len-genotyper, --hide-allreads, --hide-mallreads, --output-pallreads and --no-pool-seqs options are no longer available
- de Bruijn graphs are used to assemble sequences flanking the STR region. These sequences are incorporated into the genotyping process, resulting in improved genotyping accuracy
HipSTR v0.2
Fixed a wide range of minor bugs and substantially improved genotyping accuracy
HipSTR v0.1
I'm very excited to announce the first official release of HipSTR, a haplotype-based caller for short tandem repeats. Over the past year, I've worked extensively on this tool and feel that it's finally ready for widespread use. HipSTR is a substantial improvement over existing STR genotypers as it explicitly learns stutter models for each STR locus and utilizes a customized hidden Markov model to align reads while accounting for these artifacts. The result is a tool with unprecedented speed and accuracy for genotyping STRs. As this is the first release, we expect that there will be a handful of bugs and would appreciate if you could report these as issues in the github repo.
Thanks for using HipSTR!!
P.S. Github currently has a bug in which submodule code is not included in releases. As a result, the compressed packages below won't correctly compile. Please use either the precompiled binaries or follow the install instructions on the HipSTR main page.
Thomas