Skip to content

Commit

Permalink
Merge pull request #11 from PacificBiosciences/targeted_d6
Browse files Browse the repository at this point in the history
doc sync
  • Loading branch information
holtjma authored May 1, 2024
2 parents 2afd4b7 + ef9b74a commit 7ab0195
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 3 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# v0.10.0
## Changes
- Added support for calling _CYP2D6_ from targeted sequencing data
- In general, accuracy for targeted datasets is less than that of WGS. This is largely due to difficulties with capture that lead to decreased coverage of hybrid or duplicated alleles.
- We recommend using two additional parameters when using targeted sequencing data: `--infer-connections --normalize-d6-only`
- Added two new CLI options to support targeted sequencing datasets:
- `--infer-connections` - If set, pb-StarPhase will infer allele connections that are not observed in the dataset but common in the population. For example, *4 and *68 are commonly found together, as are *10 and *36. This option is recommended when reads are too short to directly span from one allele to the next.
- `--normalize-d6-only` - If set, pb-StarPhase will only normalize the copy numbers using the _CYP2D6_ alleles (i.e., excluding any discovered _CYP2D7_ alleles). This option is recommended when coverage of the _CYP2D7_ alleles is inconsistent relative to the _CYP2D6_ alleles.

## Fixed
- Fixed a reporting issue in the PharmCAT TSV where brackets were missing from combination alleles

# v0.9.1
## Changes
- The CLI settings log output has been updated for easier human readability
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The pb-StarPhase tool will diplotype pharmacogenomic (PGx) genes from [PacBio](h
Key features include:

* Ability to create a database from latest CPIC and IMGTHLA information
* Ability to diplotype most genes from CPIC and as well as _HLA-A_ and _HLA-B_
* Ability to diplotype most genes from CPIC and as well as _HLA-A_, _HLA-B_, and _CYP2D6_
* Works on PacBio datasets from targeted and whole genome sequencing

Authors: [Matt Holt](https://github.com/holtjma), [John Harting](https://github.com/jrharting), [Zev Kronenberg](https://github.com/zeeev)
Expand Down
11 changes: 9 additions & 2 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ pbstarphase diplotype \

# Common use cases
## HLA and _CYP2D6_ diplotyping
With v0.8.0, pb-StarPhase supports diplotyping of _HLA-A_ and _HLA-B_ from an aligned BAM file.
With v0.9.0, pb-StarPhase supports diplotyping of _CYP2D6_ from an aligned BAM file for whole genome datasets.
With v0.10.0, pb-StarPhase supports diplotyping of _HLA-A_, _HLA-B_, and _CYP2D6_ from an aligned BAM file.
If using targeted sequencing datasets, see [our recommended parameters](#can-i-diplotype-using-targeted-sequencing-data).
To enable HLA and _CYP2D6_ diplotyping, simply provide the BAM file(s) in addition to the normal parameters.
Both HLA and _CYP2D6_ diplotyping is more computationally expensive than the CPIC genes.
If run-time is an issue, we recommend using the `--threads` option to provide additional cores to StarPhase, which will improve the HLA diplotyping components.
Expand Down Expand Up @@ -286,3 +286,10 @@ For example, both "CYP2D6::CYP2D7::intron1" and "CYP2D6::CYP2D7::exon2" are re-m
While all "CYP2D7::CYP2D6" alleles are currently mapped to *13, most "CYP2D6::CYP2D7" alleles do _not_ have a known re-mapping.
Those without a known re-mapping are left in the pb-StarPhase internal format.
If you encounter an allele that you think should be re-mapped, please open an issue on our GitHub.

## Can I diplotype using targeted sequencing data?
In general yes: in our internal tests the CPIC and HLA genes behave similar to their WGS counterparts.
However, _CYP2D6_ tends to be more difficult to accurately call with targeted sequencing.
This is typically due to shorter read lengths, increased coverage variation across alleles, and full-allele drop out due to the capture.
For _CYP2D6_, this is particular problematic due to the presence of deletion, duplication, and hybrid alleles that may influence the final diplotype.
For targeted sequencing, we recommend using the following _CYP2D6_-specific additional parameters, which attempt to account for these complicating factors: `--infer-connections --normalize-d6-only`.

0 comments on commit 7ab0195

Please sign in to comment.