Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor GKS formatting changes and addition of gnomAD flags to annotation #617

Merged
merged 35 commits into from
Feb 28, 2024

Conversation

theferrit32
Copy link
Contributor

@theferrit32 theferrit32 commented Oct 5, 2023

Modifications:

  • change popFreqID to popFreqId
  • change subcohort ids to use a . delimiter instead of , to match the popmax syntax

Additions:

  • add qcFilters
  • add lowComplexityRegion (lcr)
  • add allele balance flagged count (>0.9 in the table) as heterozygousSkewedAlleleCount

One point is that the allele balance code will fail if the number of entries or the bins in the array ht.qual_hists.ab_hist_alt.bin_freq changes. That field is directly used to drive the the UI.

EDIT:
Additional changes:

  • add hemizygote counts to variant and subcohorts
  • if no popmax faf exists, sets the popmaxFAF95 object to null instead of setting the freq to 0 and having null or other string in other fields (@matren395)
  • rename popmaxFAF95 to grpMaxFAF95, popFreqId to grpFreqId (@matren395)

@matren395
Copy link
Contributor

Hey Kyle, could we roll the change to "popFreqId" not returning "None" but instead a string here as well? I had created my own PR for that and the "popFreqID" fix, but it looks like this already contains the latter and would be easy enough to host the former as well

gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
@theferrit32 theferrit32 force-pushed the kf/gks-cg-annotations branch from 36d3d24 to 011c298 Compare October 18, 2023 15:32
@theferrit32 theferrit32 marked this pull request as ready for review October 18, 2023 15:46
@theferrit32
Copy link
Contributor Author

One more pending change is to add count of hemizygotes. Which needs to be computed, it is not explicitly in a field on the table.

@larrybabb @ahwagner Was there an idea to move some of these fields out of ancillaryResults?

see: ga4gh/va-spec#119 (comment)

I think some of that might have been related to my confusion. I'm not sure the allele balance flagged actually are directly related to the popmax FAF 95. Maybe someone on the gnomad team can clarify.

gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Outdated Show resolved Hide resolved
@klaricch klaricch self-assigned this Oct 19, 2023
@matren395 matren395 self-requested a review October 20, 2023 15:03
@matren395
Copy link
Contributor

commit to roll in 1) nullable popMaxFAF95 and 2) change all references of 'pop' returned to 'grp' when relevant.

@theferrit32
Copy link
Contributor Author

Waiting for final modifications to upstream schema. There is one more discussion tomorrow. The changes will be relatively easy to implement after that.

See ga4gh/va-spec#121

@theferrit32
Copy link
Contributor Author

Adding commits and force pushing a branch rebased from main

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might take another run at the actual code in annotations.py later today

gnomad/resources/grch38/gnomad.py Outdated Show resolved Hide resolved
gnomad/resources/grch38/gnomad.py Outdated Show resolved Hide resolved
gnomad/resources/grch38/gnomad.py Outdated Show resolved Hide resolved
gnomad/resources/grch38/gnomad.py Outdated Show resolved Hide resolved
gnomad/utils/annotations.py Show resolved Hide resolved
gnomad/utils/annotations.py Show resolved Hide resolved
@theferrit32
Copy link
Contributor Author

theferrit32 commented Nov 14, 2023

Latest changeset: https://github.com/theferrit32/gnomad_methods/compare/374112031d6b1178e40cfc23161a19b82b9d5693..74d869e022933612e566755f42d8dd2e0445e114

  • fixes fafmax for v4 exomes
  • uses v4 exomes coverage (for both v4 genomes and v4 exomes, is this correct? or should genomes tables not be given any coverage info?)
  • replaces AF = None (AC/AN = 0/0) with 0. (the jsonschema requires number, can't be null)

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notes! lmk if you wanna chat abt the PR next week

gnomad/utils/annotations.py Show resolved Hide resolved
gnomad/resources/grch38/gnomad.py Outdated Show resolved Hide resolved
coverage("exomes").versions[coverage_version].path
)
coverage_ht = hl.read_table(
coverage(data_type).versions[coverage_version].path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klaricch this will use genomes version 3.0.1 coverage table for v3 and v4 genomes tables, and exomes version 4.0 coverage table for v4 exomes tables. If the input is a v3 table with data_type=exomes, line 562 will throw an exception KeyError: '3.0.1', which is expected.

@theferrit32
Copy link
Contributor Author

theferrit32 commented Jan 11, 2024

Rebasing from main to fix lint error from duplicate remaining key #617 (review)

matren395 and others added 26 commits February 27, 2024 15:17
…e_group_dicts, refactor to remove idx as param, construct ids and get freq idx locally
…axFAF95 if null to conform with latest schema. Move meanDepth to qualityMeasures
Co-authored-by: Daniel Marten <78616802+matren395@users.noreply.github.com>
Co-authored-by: klaricch <kristen@broadinstitute.org>
Co-authored-by: klaricch <kristen@broadinstitute.org>
@theferrit32 theferrit32 force-pushed the kf/gks-cg-annotations branch from be25fdb to caabf24 Compare February 27, 2024 20:17
@klaricch klaricch self-requested a review February 28, 2024 15:39
@klaricch klaricch merged commit 060fcb8 into broadinstitute:main Feb 28, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants