Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim per-allele FORMAT annotations and optionally retain raw AS annotations #5833

Merged
merged 8 commits into from
Mar 28, 2019

Conversation

ldgauthier
Copy link
Contributor

GenotypeGVCFs now uses the header info to determine if FORMAT lists need to be subset when alleles are dropped. Fixes #5704

Added an option to keep raw AS values like AS_SB_TABLE and AS_QUAL, as requested in #5698 (does not close that issue)

@ldgauthier
Copy link
Contributor Author

ldgauthier commented Mar 25, 2019

@nh3 care to take a look? In the GGVCFs integration test, the with OxoGReadCounts.g.vcf file has

20      10101674        .       TTGTGTG T,TTG,TTGTGTGTGTGTG,TTGTGTGTGTGTGTG,<NON_REF>   1464.10 .       DP=64;ExcessHet=3.0103;MLEAC=0,1,1,0,0;MLEAF=0.00,0.500,0.500,0.00,0.00;RAW_MQandDP=196622,64   GT:AD:DP:F1R2:F2R1:GQ:PL:SB         2/3:0,3,22,6,4,0:35:0,1,13,3,3,0:0,2,9,3,1,0:47:1481,1115,1205,557,577,581,932,574,0,925,973,621,47,860,972,1495,1189,615,968,1015,1570:0,0,20,15

with read counts

TTGTGTG* T TTG TTGTGTGTGTGTG TTGTGTGTGTGTGTG <NON_REF>
0 1 13 3 3 0
0 2 9 3 1 0

after genotyping alleles get dropped and trimmed and we have

TTGTG* T TTGTGTGTGTG
0 13 3
0 9 3

Do you agree that's as expected? (I couldn't reproduce your exact example without the exact bam.)

@ldgauthier
Copy link
Contributor Author

Also @tfenne hopefully this meets expectations for AS_SB_TABLE (which I have now added in AS_Annotations.keepRawCombined.expected.vcf)

@codecov-io
Copy link

codecov-io commented Mar 25, 2019

Codecov Report

Merging #5833 into master will decrease coverage by 50.971%.
The diff coverage is 58.667%.

@@               Coverage Diff                @@
##              master     #5833        +/-   ##
================================================
- Coverage     87.041%   36.071%   -50.971%     
+ Complexity     32151     17630     -14521     
================================================
  Files           1974      1977         +3     
  Lines         147413    147117       -296     
  Branches       16225     16181        -44     
================================================
- Hits          128310     53066     -75244     
- Misses         13185     89228     +76043     
+ Partials        5918      4823      -1095
Impacted Files Coverage Δ Complexity Δ
...lbender/engine/AssemblyRegionIteratorUnitTest.java 1.111% <ø> (-83.333%) 1 <0> (-10)
...tools/walkers/haplotypecaller/HaplotypeCaller.java 84.211% <ø> (ø) 23 <0> (ø) ⬇️
...kers/annotator/VariantAnnotatorEngineUnitTest.java 0.413% <0%> (-97.934%) 1 <0> (-35)
...r/allelespecific/AS_RMSMappingQualityUnitTest.java 4.348% <0%> (-95.652%) 1 <0> (-4)
...or/allelespecific/ReducibleAnnotationBaseTest.java 2.439% <0%> (-90.244%) 1 <0> (-8)
...ferenceConfidenceVariantContextMergerUnitTest.java 2.881% <0%> (-94.239%) 1 <0> (-25)
...stitute/hellbender/tools/HaplotypeCallerSpark.java 70.115% <0%> (ø) 18 <1> (ø) ⬇️
...er/tools/walkers/GenotypeGVCFsIntegrationTest.java 3.704% <0%> (-80.408%) 2 <0> (-37)
...haplotypecaller/HaplotypeCallerEngineUnitTest.java 3.704% <0%> (-92.593%) 1 <0> (-5)
...Plugin/GATKAnnotationPluginDescriptorUnitTest.java 7.219% <0%> (-81.016%) 4 <0> (-54)
... and 1286 more

@nh3
Copy link

nh3 commented Mar 25, 2019

I believe this was meant for @nh13, not me.

@ldgauthier
Copy link
Contributor Author

Sorry. I don't think that's the only one. Bad case of the Mondays.

Copy link
Contributor

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* If specified, keep the combined raw annotations (e.g. AS_SB_TABLE) after genotyping. This is applicable to Allele-Specific annotations
*/
@Argument(fullName=KEEP_COMBINED_LONG_NAME, shortName = KEEP_COMBINED_SHORT_NAME, doc = "If specified, keep the combined raw annotations")
protected boolean keepCombined = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ldgauthier! This looks great.

@ldgauthier ldgauthier merged commit 2c066f5 into master Mar 28, 2019
@tfenne
Copy link
Contributor

tfenne commented Mar 28, 2019

Thanks again for getting this done and merged @ldgauthier & @droazen!

@ldgauthier ldgauthier deleted the ldg_readCountMerge branch August 1, 2019 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

F1R2 and F2R2 annotations not updated by GenotypeGvcfs
6 participants