Skip to content

pb-StarPhase v0.13.0

Compare
Choose a tag to compare
@holtjma holtjma released this 30 Jul 19:24
· 18 commits to main since this release
267a140

v0.13.0

Changes

  • The algorithm for HLA-A and HLA-B has been modified to use a consensus-based approach to solve the alleles, a simpler version of the algorithm for CYP2D6.
    • CLI options related to consensus generation now control both HLA and CYP2D6 calling. These have been moved into a separate category on the CLI labeled "Consensus (HLA and CYP2D6)".
    • In internal tests, these changes slightly improved the accuracy of 4th-field entries in the HLA calls (2nd- and 3rd-field were unaffected). Additionally, the approach significantly reduced compute time requirements, averaging ~10% of CPU time required for v0.12.0.
    • With this change, the --threads option does not provide any benefit to the current algorithms. It has been deprecated, but may be added again if future optimizations allow it.
    • The --max-error-rate default has been adjusted for comparison to just the reference allele for each HLA gene, with a new default of 0.07 (previously 0.05).
    • Previous option --min-allele-fraction for HLA has been removed. The consensus option --min-consensus-fraction is used instead.
  • Added a new option, --output-debug, that will create a debug folder with multiple additional files that are primarily for debugging the results from HLA and CYP2D6 calling, but may be useful for researchers. This folder is subject to change as the underlying methods develop. Some of the initial files included:
    • consensus_{GENE}.fa - Contains the full consensus sequences generated for a given {GENE}. Currently, this is only for HLA genes and CYP2D6.
    • cyp2d6_consensus.bam - Contains mapped substrings from the reads that were used to generate CYP2D6 consensus sequences. The phase set tag (PS) indicates which consensus the sequence was a part of. Useful for visualizing how the consensus ran and whether there are potential errors.
    • cyp2d6_link_graph.svg - A graphical representation of the connections present between CYP2D6 consensus segments.
    • hla_debug.json - Contains the summary mapping information of each database entry to the generated HLA consensus sequences.

Fixed

  • Fixed an issue with build where CPIC genes with no known chromosome would cause an error and exit. These entries are now ignored with a warning.
  • Fixed an off-by-one error in the HLA gene region start coordinates. This has been corrected in the latest database release: data/v0.13.0/pbstarphase_20240730.json.gz