Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix no data hom refs #8715

Merged
merged 8 commits into from
Mar 7, 2024
Merged

fix no data hom refs #8715

merged 8 commits into from
Mar 7, 2024

Conversation

ldgauthier
Copy link
Contributor

No description provided.

@gatk-bot
Copy link

gatk-bot commented Mar 5, 2024

Github actions tests reported job failures from actions build 8161782218
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 8161782218.11 logs
integration 17.0.6+10 8161782218.0 logs

@gatk-bot
Copy link

gatk-bot commented Mar 5, 2024

Github actions tests reported job failures from actions build 8164035876
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 8164035876.11 logs
integration 17.0.6+10 8164035876.0 logs

@@ -306,7 +306,7 @@ public static void makeGenotypeCall(final int ploidy,
final GenotypeAssignmentMethod assignmentMethod,
final double[] genotypeLikelihoods,
final List<Allele> allelesToUse,
final List<Allele> originalGT,
final Genotype originalGT,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this method signature so that we could use the GQ info to distinguish hom-ref genotypes that we want to no-call from hom-refs that stay hom-refs because with reblocking there are no PLs to use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to method doc

@@ -588,7 +588,7 @@ public void testCombineReblockedGVCFs() {
Assert.assertFalse(g0.hasPL());
Assert.assertFalse(g0.hasGQ());
Assert.assertFalse(g0.hasExtendedAttribute(VCFConstants.DEPTH_KEY));
Assert.assertTrue(g1.isHomRef());
Assert.assertTrue(g1.isNoCall());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a lot of these tests I was explicitly testing the behavior of the GQ0 hom-ref, but that's exactly what we're trying to change here

Assert.assertTrue(vc1.hasAttribute(GATKVCFConstants.INBREEDING_COEFFICIENT_KEY));
Assert.assertEquals(vc1.getAttributeAsInt(VCFConstants.ALLELE_NUMBER_KEY, 0), 362);
Assert.assertEquals(vc1.getAttributeAsInt(VCFConstants.ALLELE_NUMBER_KEY, 0), 300);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes because of the new no-calls

final VariantContext vc1 = outputVCs.get(1);
Assert.assertTrue(vc1.getAttributeAsDouble(GATKVCFConstants.EXCESS_HET_KEY, 1000.0) < 10.0); //will be ~72 if homRefs aren't counted
Assert.assertTrue(vc1.getAttributeAsDouble(GATKVCFConstants.EXCESS_HET_KEY, 0.0) > 50.0); //will be ~72 if homRefs aren't counted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some hom-refs in previous test result go to no call, so InbreedingCoeff and ExcessHet also change

@@ -976,4 +982,39 @@ public void dbSNPError() {
//make sure it doesn't throw an error
runCommandLine(args);
}

@Test
public void testNoReadsOutputAsNoCall() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input data for this test comes from a GVCF produced from an empty region of one of the NA12878 test bams and a GVCF that was edited to have a variant so we can force that position to be output. (I should probably put this test description in the code.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor documentation/comment requests. Without getting very deep into the weeds of those integration test annotation changes (which look fine as per your comments) i think this looks good on the surface.

There are two more abstract questions I will pose:
Do we want to preserve the ability for anybody ever to output the intermidate outputs? We could create a new mode in GenotypeAssignmentMethod for that and change this default one to use the new behavior to satisfy twitter...

The second one is just to ask if these are the only two places (GnarlyGenotyper and CombineGVCFs) that need to change? Its only at the combine step we are throwing these out so the genomicsDB path shouldn't be a problem no? Do we want HC outputting ./. genotypes?

@@ -306,7 +306,7 @@ public static void makeGenotypeCall(final int ploidy,
final GenotypeAssignmentMethod assignmentMethod,
final double[] genotypeLikelihoods,
final List<Allele> allelesToUse,
final List<Allele> originalGT,
final Genotype originalGT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to method doc

if (assignmentMethod == GenotypeAssignmentMethod.PREFER_PLS) {
if (originalGT == null) {
throw new IllegalArgumentException("original GT cannot be null if assignmentMethod is PREFER_PLS");
} else if (originalGT.hasGQ() && originalGT.getGQ() > 0){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. this makes sense and was certainly better than my hack attempt to fix this... so its now "prefer_PLs EXCEPT for nocalls" Can you add a note about this in GenotypeAssignmentMethod?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to preserve the intermediate behavior at all? We could split PREFER_PLS to also have a mode that treats the 0/0s like they used to be.

@@ -158,6 +158,7 @@ public Object[][] gvcfsToGenotype() {

//23 highly multi-allelic sites across 54 1000G exomes to test allele subsetting and QUAL calculation
//plus one 10-allele WGS variant that's all hom-ref with one GT that has unnormalized PLs from some sort of GenomicsDB corner case
//this VCF still has the haploid-looking GDB no-calls as in sample NA21137 at position chr1:3836468 -- allegedly GATK 4.2.5.0 from February 7, 2022, possibly due to --call-genotypes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍

@@ -804,6 +805,9 @@ public void testWithReblockedGVCF() {
final List<VariantContext> actualVC = VariantContextTestUtils.getVariantContexts(output);
Assert.assertFalse(actualVC.stream().anyMatch(vc -> vc.getGenotype(1).isHomRef() && vc.getGenotype(1).hasPL())); //second sample has a bunch of 0/0s -- shouldn't have PLs

//this comes from a callset of NYGC 1000G samples plus syndip
//it seems likely that there's a variant that wasn't discovered in the graph because a bunch of samples are hom-ref with PL=[0,0,X]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minor comment... but its possible to manufacture a sample with lots of assembly failures if you need to test this. You can set the kmer size to 1 and disallow stepped increases to get a sample that can assemble nothing.... Doesn't look necessary for you here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, that is good to know.

@@ -976,4 +982,39 @@ public void dbSNPError() {
//make sure it doesn't throw an error
runCommandLine(args);
}

@Test
public void testNoReadsOutputAsNoCall() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ldgauthier added a commit that referenced this pull request Mar 6, 2024
@droazen
Copy link
Collaborator

droazen commented Mar 7, 2024

M2 WDL tests failed due to a transient 429 "too many requests" error -- re-running them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants