Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcf export #241

Merged
merged 62 commits into from
Aug 4, 2020
Merged

Vcf export #241

merged 62 commits into from
Aug 4, 2020

Conversation

ch-kr
Copy link
Contributor

@ch-kr ch-kr commented Jul 22, 2020

Added constants and some functions that are used during MT/HT>VCF export

@ch-kr ch-kr requested a review from jkgoodrich July 23, 2020 21:08
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your hard work on this KC! I have some suggested changes and comments.

gnomad/utils/vcf.py Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Show resolved Hide resolved
return header_hist_dict


def sample_sum_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we want to check on the internal HT sanity checks right? I think it should also be moved somewhere else like the other function I mentioned above, but again not positive where to put it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this could go into an assessment/sanity_checks.py type file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py sounds good to me for this and the one above it

gnomad/utils/vcf.py Outdated Show resolved Hide resolved
@ch-kr ch-kr requested a review from jkgoodrich July 29, 2020 13:57
Copy link
Contributor

@gtiao gtiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples in the docstrings are great -- having examples is clarifying in a way that describing things often is not!

gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
]

if faf:
female_metrics.extend([x for x in female_metrics if "faf" in x])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this line: it looks like you're extending the existing list female_metrics by faf elements that are already in the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faf elements aren't in the list -- female_metrics first pulls all annotations that contain _female and then corrects that list to keep only elements with both _female and AC, AN, or nhomalt. the faf annotations (faf95_female) don't contain AC, AN, or nhomalt, so they aren't kept in that step.

that said, why do we set AC, AN, and nhomalt to 0 but leave AF? I'm going to change this to just be female_metrics = [x for x in metrics if "_female" in x] if that works with you and @jkgoodrich

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more smaller changes requested, but it's looking really good. Thank you @ch-kr!

return header_hist_dict


def sample_sum_check(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should move to keep this script VCF focused and assessment/sanity_checks.py sounds good to me for this and the one above it

gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
ht_orig.select(*display_fields).show()


def make_filters_sanity_check_expr(ht: hl.Table) -> Dict[str, hl.expr.Expression]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, let's make an assessment/sanity_checks.py

gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Outdated Show resolved Hide resolved
gnomad/utils/vcf.py Show resolved Hide resolved
]

if faf:
female_metrics.extend([x for x in female_metrics if "faf" in x])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, interesting that AF is not set to 0. I will defer to @gtiao for this one, there may have been motivation for this that I am not aware of.

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of this @ch-kr! I only had one small comment (that has more to do with UKBB than the common code actually). Then I think there might have been a comment/s where @gtiao was tagged for opinion, but otherwise this looks good to me.

Copy link
Contributor

@gtiao gtiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! So excited to have all this beautiful, shiny, clean code merged into the common repo!

combos = make_label_combos(label_groups)

for combo in combos:
combo_fields = combo.split("_")
group_dict = dict(zip(group_types, combo_fields))

for_combo = make_combo_header_text("for", group_dict, prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea for streamlining the code a bit!

f"{prefix}hom": "|".join(
map(lambda x: f"{x:.1f}", ht.head(1).age_hist_hom.collect()[0].bin_edges)
),
f"{prefix}{call_type}": "|".join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice consolidation of code

Displays the number of rows in the Table that match the `cond_expr` and fail to be the desired condition (`check_description`).
If the number of rows that match the `cond_expr` is 0, then the Table passes that check; otherwise, it fails.

.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this note -- it's probably something easily missed when setting up the generic field check, so it's great to have this additional reminder

@gtiao gtiao merged commit d15c4d1 into master Aug 4, 2020
@gtiao gtiao deleted the vcf_export branch August 4, 2020 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants