Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified HaplotypeCaller's force-calling mode with Mutect2's #6090

Merged
merged 4 commits into from
Sep 9, 2019

Conversation

davidbenjamin
Copy link
Contributor

@ldgauthier as we discussed.

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced the test coverage is great, but if you can explain what's going on in the expected test VCF, everything else looks pretty good.

public FeatureInput<VariantContext> alleles;

@Advanced
@Argument(fullName = "genotype-filtered-alleles", doc = "Whether to force genotype even filtered alleles", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "Force genotyping of filtered alleles included in the resource specified by --alleles"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ( assemblyArgs.ggaExtension < 0) {
throw new CommandLineException.BadArgumentValue("maxGGAAREExtension", "" + assemblyArgs.ggaExtension + "< 0");
if ( assemblyArgs.extension < 0) {
throw new CommandLineException.BadArgumentValue("maxDiscARExtension", "" + assemblyArgs.extension + "< 0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff for this file is so beautiful. <3 I'm glad you took out all the GGA special casing.

@@ -1820,5 +1822,6 @@ public static boolean isUnmixedMnpIgnoringNonRef(final VariantContext vc) {

return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious why there are two new imports, but no new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was diplomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.collect(Collectors.toMap(vc -> vc.getStart(), vc -> vc.getAlternateAlleles()));
for (final VariantContext vc : new FeatureDataSource<VariantContext>(forceCallingVcf)) {
final List<Allele> altAllelesAtThisLocus = altAllelesByPosition.get(vc.getStart());
vc.getAlternateAlleles().forEach(a -> Assert.assertTrue(altAllelesAtThisLocus.contains(a)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this improved test.

@@ -32,7 +32,6 @@
20 10004094 . A T . . . GT 0|1
20 10004769 . TAAAACTATGC T . . . GT 0|1
20 10004771 . A T . . . GT 0|1
20 10006819 . AAAAC T . . . GT 0|1
20 10006819 . AAAAC A . . . GT 0|1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are fixes with respect to the proper reference base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The old GGA mode didn't care if the -alleles vcf was invalid.

// then we may need to trim the alleles (because the original VariantContext may have had to pad at the end).
if ( outputAlleles.size() != vc.getAlleles().size() && !limitedContext ) // limitedContext callers need to handle allele trimming on their own to keep their alleles in sync
{
vcCall = GATKVariantContextUtils.reverseTrimAlleles(vcCall);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the trimming go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had above:

final boolean limitedContext = features == null || refContext == null || rawContext == null || stratifiedContexts == null;

In every invocation of this method (which I think I cleaned up in this PR) all of these things were null and so limtiedContext was inevitable true. Hence the && !limitedContext prevented this reverse trimming code from ever being reached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And don't worry, M2 and HC still properly trim alleles. In fact, M2 already has an integration test to make sure we output a parsimonious allele representation even after indels that miss the tumor lod threshold are invisibly dropped.

@@ -389,7 +338,9 @@ private OutputAlleleSubset calculateOutputAlleleSubset(final AFCalculationResult

//it's possible that the upstream deletion that spanned this site was not emitted, mooting the symbolic spanning deletion allele
final boolean isSpuriousSpanningDeletion = GATKVCFConstants.isSpanningDeletion(allele) && !isVcCoveredByDeletion(vc);
final boolean toOutput = (isPlausible || forceKeepAllele(allele) || isNonRefWhichIsLoneAltAllele) && !isSpuriousSpanningDeletion;

//TODO: force-clling logic goes here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that had already been addressed. Fixed.

}
// if the site was down-sampled, record that fact
if ( !limitedContext && rawContext.hasPileupBeenDownsampled() ) {
attributes.put(GATKVCFConstants.DOWNSAMPLED_KEY, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never ever seen data that has this key. I'm glad you're removing any mention of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More unreachable code due to limitedContext always being true.

@davidbenjamin davidbenjamin force-pushed the db_unify_gga branch 2 times, most recently from 24e7f36 to b48b007 Compare August 20, 2019 18:55
@davidbenjamin
Copy link
Contributor Author

Addressed comments and beefed-up test coverage. Back to @ldgauthier.

@davidbenjamin davidbenjamin mentioned this pull request Aug 23, 2019
@davidbenjamin
Copy link
Contributor Author

@ldgauthier Fresh rebase post-deletion of old qual.

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infinitesimally minor comment about variable names, otherwise good to go!

@@ -125,4 +130,11 @@ public ReadThreadingAssembler createReadThreadingAssembler() {
@Argument(fullName = MAX_MNP_DISTANCE_LONG_NAME, shortName = MAX_MNP_DISTANCE_SHORT_NAME,
doc = "Two or more phased substitutions separated by this distance or less are merged into MNPs.", optional = true)
public int maxMnpDistance = getDefaultMaxMnpDistance();

@Argument(fullName= FORCE_CALL_ALLELES_LONG_NAME, doc="The set of alleles for which to force genotyping regardless of evidence", optional=true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of places where the variables or test names or whatever say force call, but the arg doc here says force genotyping. Very minor nit to pick, but it would be good to be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants