-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vcf export #241
Merged
Merged
Vcf export #241
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
60b8640
Added constants, make_label_combos, generic_field_check, and make_fil…
ch-kr 3decb5f
Added AS_FIELDS, SITE_FIELDS constants and added function add_as_info…
ch-kr c4ad6b2
Updated make_as_info_dict and added more constants (RF_FIELDS, VQSR_F…
ch-kr 6c9b95f
Added constants for entries to select during export
ch-kr a9c0354
Added make_info_dict to vcf.py
ch-kr 20f5d52
Removed unnecessary pop constants (can be imported from ancestry.py)
ch-kr 2117d82
Added make hist bin edges expr function to vcf.py, also removed types…
ch-kr 78084a7
Added make hist dict
ch-kr 5c698b7
Updated changelog
ch-kr e30d67b
Added SORT_ORDER and sample_sum_check to vcf.py
ch-kr 706864d
Moved make combo header text to vcf.py
ch-kr d058743
Fixed imports and reformatted with black
ch-kr d1bbacd
Changed docstring for make_hist_bin_edges_expr
ch-kr 424fcc1
Added set female metrics to NA
ch-kr d397b87
Updated docstring in set female metrics to na
ch-kr ec0ddd6
Removed transmitted singleton and sibling singleton from SITE_FIELDS
ch-kr 647d8e6
Updated make_label_combos docstring and fixed values for some constan…
ch-kr 973c5ef
Updated docstring for generic_field_check
ch-kr ae3fcf4
Updated docstring for make_filters_sanity_check_expr
ch-kr d3967aa
Updated make_filters_sanity_check_expr
ch-kr 5313ea1
Updated docstring for make_combo_header_text
ch-kr b556441
Updated docstring for make_info_dict
ch-kr 13ac89e
Updated make_combo_header_text to take dict as input
ch-kr 1202e2e
Updated make_info_dict to pass in sort order
ch-kr 759cfac
Addressed rest of review comments
ch-kr fb14af8
Add BaseQRankSum to SITE_FIELDS const
ch-kr 8d1f38c
Addressed review comments
ch-kr 92dbedf
Update cache and setup-python Actions (#244)
nawatts d18e2bb
Added constants, make_label_combos, generic_field_check, and make_fil…
ch-kr 26956b9
Added AS_FIELDS, SITE_FIELDS constants and added function add_as_info…
ch-kr 02f4f56
Updated make_as_info_dict and added more constants (RF_FIELDS, VQSR_F…
ch-kr e7edd87
Added constants for entries to select during export
ch-kr 922a674
Added make_info_dict to vcf.py
ch-kr 9ff11ca
Removed unnecessary pop constants (can be imported from ancestry.py)
ch-kr 1bb7162
Added make hist bin edges expr function to vcf.py, also removed types…
ch-kr 48bf3bb
Added make hist dict
ch-kr e736da0
Updated changelog
ch-kr ffaa908
Added SORT_ORDER and sample_sum_check to vcf.py
ch-kr 00b92ec
Moved make combo header text to vcf.py
ch-kr 6ed92b0
Fixed imports and reformatted with black
ch-kr 191b073
Changed docstring for make_hist_bin_edges_expr
ch-kr d8a3e7a
Added set female metrics to NA
ch-kr 51a5a73
Updated docstring in set female metrics to na
ch-kr 6b1969d
Removed transmitted singleton and sibling singleton from SITE_FIELDS
ch-kr 2af3587
Updated make_label_combos docstring and fixed values for some constan…
ch-kr ca57708
Updated docstring for generic_field_check
ch-kr 4428b0c
Updated docstring for make_filters_sanity_check_expr
ch-kr 75e997d
Updated make_filters_sanity_check_expr
ch-kr fdfed1a
Updated docstring for make_combo_header_text
ch-kr 2c47c63
Updated docstring for make_info_dict
ch-kr 41fb66d
Updated make_combo_header_text to take dict as input
ch-kr 2ddd8f1
Updated make_info_dict to pass in sort order
ch-kr 3a80a41
Addressed rest of review comments
ch-kr 1e48d12
Add BaseQRankSum to SITE_FIELDS const
ch-kr ff61379
Addressed review comments
ch-kr ca49ac5
Updated docstring for sample_sum_check
ch-kr 5a27577
Rebasing branch
ch-kr 261c38d
Updated some docstrings addressing review comments
ch-kr d176f68
Created assessment folder and moved sample_sum_check, generic_field_c…
ch-kr 1640b3f
Forgot to commit changes in vcf.py when moving to assessment
ch-kr 6e67800
Updated generic field check docstring
ch-kr f021255
Added option to check for additional filters to make_filters_sanity_c…
ch-kr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from gnomad.assessment import sanity_checks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
import logging | ||
from typing import Dict, List, Optional | ||
|
||
import hail as hl | ||
|
||
from gnomad.utils.vcf import make_label_combos, SORT_ORDER | ||
|
||
|
||
logging.basicConfig(format="%(levelname)s (%(name)s %(lineno)s): %(message)s") | ||
logger = logging.getLogger(__name__) | ||
logger.setLevel(logging.INFO) | ||
|
||
|
||
def generic_field_check( | ||
ht: hl.Table, | ||
cond_expr: hl.expr.BooleanExpression, | ||
check_description: str, | ||
display_fields: List[str], | ||
verbose: bool, | ||
) -> None: | ||
""" | ||
Check a generic logical condition involving annotations in a Hail Table and print the results to terminal. | ||
|
||
Displays the number of rows in the Table that match the `cond_expr` and fail to be the desired condition (`check_description`). | ||
If the number of rows that match the `cond_expr` is 0, then the Table passes that check; otherwise, it fails. | ||
|
||
.. note:: | ||
`cond_expr` and `check_description` are opposites and should never be the same. | ||
E.g., If `cond_expr` filters for instances where the raw AC is less than adj AC, | ||
then it is checking sites that fail to be the desired condition (`check_description`) | ||
of having a raw AC greater than or equal to the adj AC. | ||
|
||
:param ht: Table containing annotations to be checked. | ||
:param cond_expr: Logical expression referring to annotations in ht to be checked. | ||
:param check_description: String describing the condition being checked; is displayed in terminal summary message. | ||
:param display_fields: List of names of ht annotations to be displayed in case of failure (for troubleshooting purposes); | ||
these fields are also displayed if verbose is True. | ||
:param verbose: If True, show top values of annotations being checked, including checks that pass; if False, | ||
show only top values of annotations that fail checks. | ||
""" | ||
ht_orig = ht | ||
ht = ht.filter(cond_expr) | ||
n_fail = ht.count() | ||
if n_fail > 0: | ||
logger.info(f"Found {n_fail} sites that fail {check_description} check:") | ||
ht = ht.flatten() | ||
ht.select("locus", "alleles", *display_fields).show() | ||
else: | ||
logger.info(f"PASSED {check_description} check") | ||
if verbose: | ||
ht_orig = ht_orig.flatten() | ||
ht_orig.select(*display_fields).show() | ||
|
||
|
||
def make_filters_sanity_check_expr( | ||
ht: hl.Table, extra_filter_checks: Optional[Dict[str, hl.expr.Expression]] = None | ||
) -> Dict[str, hl.expr.Expression]: | ||
""" | ||
Make Hail expressions to measure % variants filtered under varying conditions of interest. | ||
|
||
Checks for: | ||
- Total number of variants | ||
- Fraction of variants removed due to: | ||
- Any filter | ||
- Inbreeding coefficient filter in combination with any other filter | ||
- AC0 filter in combination with any other filter | ||
- Random forest filtering in combination with any other filter | ||
- Only inbreeding coefficient filter | ||
- Only AC0 filter | ||
- Only random forest filtering | ||
|
||
:param ht: Table containing 'filter' annotation to be examined. | ||
:param extra_filter_checks: Optional dictionary containing filter condition name (key) extra filter expressions (value) to be examined. | ||
:return: Dictionary containing Hail aggregation expressions to examine filter flags. | ||
""" | ||
filters_dict = { | ||
"n": hl.agg.count(), | ||
"frac_any_filter": hl.agg.fraction(hl.len(ht.filters) != 0), | ||
"frac_inbreed_coeff": hl.agg.fraction(ht.filters.contains("InbreedingCoeff")), | ||
"frac_ac0": hl.agg.fraction(ht.filters.contains("AC0")), | ||
"frac_rf": hl.agg.fraction(ht.filters.contains("RF")), | ||
"frac_inbreed_coeff_only": hl.agg.fraction( | ||
ht.filters.contains("InbreedingCoeff") & (ht.filters.length() == 1) | ||
), | ||
"frac_ac0_only": hl.agg.fraction( | ||
ht.filters.contains("AC0") & (ht.filters.length() == 1) | ||
), | ||
"frac_rf_only": hl.agg.fraction( | ||
ht.filters.contains("RF") & (ht.filters.length() == 1) | ||
), | ||
} | ||
if extra_filter_checks: | ||
filters_dict.update(extra_filter_checks) | ||
|
||
return filters_dict | ||
|
||
|
||
def sample_sum_check( | ||
ht: hl.Table, | ||
prefix: str, | ||
label_groups: Dict[str, List[str]], | ||
verbose: bool, | ||
subpop: bool = None, | ||
sort_order: List[str] = SORT_ORDER, | ||
) -> None: | ||
""" | ||
Compute afresh the sum of annotations for a specified group of annotations, and compare to the annotated version; | ||
display results from checking the sum of the specified annotations in the terminal. | ||
|
||
:param ht: Table containing annotations to be summed. | ||
:param prefix: String indicating sample subset. | ||
:param label_groups: Dictionary containing an entry for each label group, where key is the name of the grouping, | ||
e.g. "sex" or "pop", and value is a list of all possible values for that grouping (e.g. ["male", "female"] or ["afr", "nfe", "amr"]). | ||
:param verbose: If True, show top values of annotations being checked, including checks that pass; if False, | ||
show only top values of annotations that fail checks. | ||
:param subpop: Subpop abbreviation, supplied only if subpopulations are included in the annotation groups being checked. | ||
:param sort_order: List containing order to sort label group combinations. Default is SORT_ORDER. | ||
:return: None | ||
""" | ||
label_combos = make_label_combos(label_groups) | ||
combo_AC = [ht.info[f"{prefix}AC_{x}"] for x in label_combos] | ||
combo_AN = [ht.info[f"{prefix}AN_{x}"] for x in label_combos] | ||
combo_nhomalt = [ht.info[f"{prefix}nhomalt_{x}"] for x in label_combos] | ||
|
||
group = label_groups.pop("group")[0] | ||
alt_groups = "_".join( | ||
sorted(label_groups.keys(), key=lambda x: sort_order.index(x)) | ||
) | ||
|
||
annot_dict = { | ||
f"sum_AC_{group}_{alt_groups}": hl.sum(combo_AC), | ||
f"sum_AN_{group}_{alt_groups}": hl.sum(combo_AN), | ||
f"sum_nhomalt_{group}_{alt_groups}": hl.sum(combo_nhomalt), | ||
} | ||
|
||
ht = ht.annotate(**annot_dict) | ||
|
||
for subfield in ["AC", "AN", "nhomalt"]: | ||
if not subpop: | ||
generic_field_check( | ||
ht, | ||
( | ||
ht.info[f"{prefix}{subfield}_{group}"] | ||
!= ht[f"sum_{subfield}_{group}_{alt_groups}"] | ||
), | ||
f"{prefix}{subfield}_{group} = sum({subfield}_{group}_{alt_groups})", | ||
[ | ||
f"info.{prefix}{subfield}_{group}", | ||
f"sum_{subfield}_{group}_{alt_groups}", | ||
], | ||
verbose, | ||
) | ||
else: | ||
generic_field_check( | ||
ht, | ||
( | ||
ht.info[f"{prefix}{subfield}_{group}_{subpop}"] | ||
!= ht[f"sum_{subfield}_{group}_{alt_groups}"] | ||
), | ||
f"{prefix}{subfield}_{group}_{subpop} = sum({subfield}_{group}_{alt_groups})", | ||
[ | ||
f"info.{prefix}{subfield}_{group}_{subpop}", | ||
f"sum_{subfield}_{group}_{alt_groups}", | ||
], | ||
verbose, | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this note -- it's probably something easily missed when setting up the generic field check, so it's great to have this additional reminder