-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vcf export #241
Vcf export #241
Conversation
…ters_sanity_check_expr to vcf.py from gnomad_qc. Also added INFO_DICT, FORMAT_DICT, and make_vcf_filter_dict from ukb repo
…_dict to update INFO_DICT with AS fields and their values
…IELDS, REGION_TYPE_FIELDS, ALLELE_TYPE_FIELDS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all your hard work on this KC! I have some suggested changes and comments.
gnomad/utils/vcf.py
Outdated
return header_hist_dict | ||
|
||
|
||
def sample_sum_check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we want to check on the internal HT sanity checks right? I think it should also be moved somewhere else like the other function I mentioned above, but again not positive where to put it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this could go into an assessment/sanity_checks.py
type file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py
sounds good to me for this and the one above it
…ts/their descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples in the docstrings are great -- having examples is clarifying in a way that describing things often is not!
gnomad/utils/vcf.py
Outdated
] | ||
|
||
if faf: | ||
female_metrics.extend([x for x in female_metrics if "faf" in x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this line: it looks like you're extending the existing list female_metrics
by faf elements that are already in the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faf elements aren't in the list -- female_metrics
first pulls all annotations that contain _female
and then corrects that list to keep only elements with both _female
and AC
, AN
, or nhomalt
. the faf annotations (faf95_female
) don't contain AC, AN, or nhomalt, so they aren't kept in that step.
that said, why do we set AC, AN, and nhomalt to 0 but leave AF? I'm going to change this to just be female_metrics = [x for x in metrics if "_female" in x]
if that works with you and @jkgoodrich
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.
…ts/their descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more smaller changes requested, but it's looking really good. Thank you @ch-kr!
gnomad/utils/vcf.py
Outdated
return header_hist_dict | ||
|
||
|
||
def sample_sum_check( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py
sounds good to me for this and the one above it
gnomad/utils/vcf.py
Outdated
ht_orig.select(*display_fields).show() | ||
|
||
|
||
def make_filters_sanity_check_expr(ht: hl.Table) -> Dict[str, hl.expr.Expression]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, let's make an assessment/sanity_checks.py
gnomad/utils/vcf.py
Outdated
] | ||
|
||
if faf: | ||
female_metrics.extend([x for x in female_metrics if "faf" in x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.
…heck, make_filters_sanity_check_expr to assessment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! So excited to have all this beautiful, shiny, clean code merged into the common repo!
combos = make_label_combos(label_groups) | ||
|
||
for combo in combos: | ||
combo_fields = combo.split("_") | ||
group_dict = dict(zip(group_types, combo_fields)) | ||
|
||
for_combo = make_combo_header_text("for", group_dict, prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea for streamlining the code a bit!
f"{prefix}hom": "|".join( | ||
map(lambda x: f"{x:.1f}", ht.head(1).age_hist_hom.collect()[0].bin_edges) | ||
), | ||
f"{prefix}{call_type}": "|".join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nice consolidation of code
Displays the number of rows in the Table that match the `cond_expr` and fail to be the desired condition (`check_description`). | ||
If the number of rows that match the `cond_expr` is 0, then the Table passes that check; otherwise, it fails. | ||
|
||
.. note:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this note -- it's probably something easily missed when setting up the generic field check, so it's great to have this additional reminder
Added constants and some functions that are used during MT/HT>VCF export