-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple alternate alleles. #193
Comments
Hi Aaron, Just pinging on this issue, we have a couple of users for which being able to handle multiple alternate alleles would be really helpful. I saw in #161 there has been discussion about how to handle it. Was there ever a consensus? |
Hi Rory, Support for multiple alleles is still in the works. We now know that both SnpEff and VEP can provide distinct annotations for the impact of each allele on each transcript. This will be next on the list after standardizing SO terms and allowing auto_* tools to support families without defined parents (both are being worked on as we speak). The one thing I haven't thought through fully is how to represent genotypes for each sample in the case of multiple alleles. For example, consider the following multi-allele variant:
We would split this into two variant rows, yet I find it a bit confusing as to how to properly report the genotypes for samples that ARE variable, yet for the allele now stored in the other row. One convention would be to just store such genotypes as 0/0 (HOM_REF). I have marked these below with a *. It seems like that is the best way, yet the genotypes are technically incorrect. I guess another way would be to mark them as unknown instead - thoughts?
|
I also relaize that correctly calculating the AAF column for each allele will be tricky. It seems like setting the starred genotypes above to unknown is the way to go. |
Refer to https://github.com/ekg/vcflib#vcfbreakmulti I'm working this issue on my project.. Thanks. |
An update. @brentp and I discussed the changes that need to be made to support multiple alternate alleles and I think we have a decent preliminary plan to get this in place in the near future. If we run into any complications, we will revisit the discussion here. |
Aaron and Brent; https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/multiallelic.py It decomposes multi-allelic inputs into bi-allelic, redoes the effects prediction, then uploads into GEMINI. It's great that you all are working on this. |
@chapmanb the plan is to split multi-allelic variants to have separate rows in the variants table. This will require changing the index on the variants table to be a joint index on variant_id, allele. There are a number of other changes, including many that I'm likely not aware of, but I hope to figure those out as I get familiar with the code-base. |
link to the ga4gh issue talking about multiple alts from @chapmanb 's implementation: ga4gh/ga4gh-schemas#169 |
Brent; |
Hi Brent and Aaron, I've been following this thread and others related to multi-allelic representation in GEMINI as it is one thing I consider high priority for me as a user. So, I'm glad to hear this is being worked on! |
Hi Andrew, |
@oleraj I have this implemented. as a simple function, along with the stuff to replicate what vt decompose does. Together, these should reduce false negatives. |
The linked branch still needs to parse out the per-alt effects from SnpEff/VEP, but the initial utility code is there. |
Great, looking forward to these updates. Thanks for your work on this. |
this has been merged into master. |
No description provided.
The text was updated successfully, but these errors were encountered: