Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample qc bug fixes #259

Merged
merged 9 commits into from
Sep 17, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
* Changed quality histograms to label histograms calculated on raw and not adj data [(#247)](https://github.com/broadinstitute/gnomad_methods/pull/247)
* Updated some VCF export constants [(#249)](https://github.com/broadinstitute/gnomad_methods/pull/249)
* Changed default DP threshold to 5 for hemi genotype calls in `annotate_adj` and `get_adj_expr` [(#252)](https://github.com/broadinstitute/gnomad_methods/pull/252)
* Fix for error in `compute_stratified_sample_qc` where `gt_expr` caused error [(#259)](https://github.com/broadinstitute/gnomad_methods/pull/259)
* Add reference genome to call of `has_liftover` in `get_liftover_genome` [(#259)](https://github.com/broadinstitute/gnomad_methods/pull/259)

## Version 0.4.0 - July 9th, 2020

Expand Down
12 changes: 7 additions & 5 deletions gnomad/sample_qc/filtering.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,22 +185,22 @@ def compute_stratified_sample_qc(
mt: hl.MatrixTable,
strata: Dict[str, hl.expr.BooleanExpression],
tmp_ht_prefix: Optional[str],
gt_expr: Optional[hl.expr.CallExpression],
gt_col: Optional[str] = None,
) -> hl.Table:
"""
Runs hl.sample_qc on different strata and then also merge the results into a single expression.
Note that strata should be non-overlapping, e.g. SNV vs indels or bi-allelic vs multi-allelic
Note that strata should be non-overlapping, e.g. SNV vs indels or bi-allelic vs multi-allelic

:param mt: Input MT
:param strata: Strata names and filtering expressions
:param tmp_ht_prefix: Optional path prefix to write the intermediate strata results to (recommended for larger datasets)
:param gt_expr: Optional entry field storing the genotype (if not specified, then it is assumed that it is stored in mt.GT)
:param gt_col: Name of entry field storing the genotype. Default: 'GT'
:return: Sample QC table, including strat-specific numbers
"""
mt = mt.select_rows(**strata)

if gt_expr is not None:
mt = mt.select_entries(GT=gt_expr)
if gt_col is not None:
jkgoodrich marked this conversation as resolved.
Show resolved Hide resolved
mt = mt.select_entries(GT=mt[gt_col])
else:
mt = mt.select_entries("GT")

Expand Down Expand Up @@ -258,6 +258,7 @@ def merge_sample_qc_expr(
additive_metrics = [
"n_called",
"n_not_called",
"n_filtered",
"n_hom_ref",
"n_het",
"n_hom_var",
Expand Down Expand Up @@ -314,6 +315,7 @@ def merge_sample_qc_expr(
]
).drop("n")
for metric in stats_metrics
if metric in sample_qc_fields
}
)

Expand Down
2 changes: 1 addition & 1 deletion gnomad/utils/liftover.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def get_liftover_genome(
chain = GRCH37_to_GRCH38_CHAIN

logger.info("Adding liftover chain to input build...")
if source.has_liftover():
if source.has_liftover(target):
logger.warning(
f"Source reference build {source.name} already has a chain file: {source._liftovers}!\
Using whichever chain has already been added."
Expand Down