Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added --sites-only argument to GATK engine #4764

Merged
merged 4 commits into from
May 17, 2018
Merged

Conversation

jamesemery
Copy link
Collaborator

Fixes #4763

@jamesemery
Copy link
Collaborator Author

@ldgauthier This is what you requested correct?

@ldgauthier
Copy link
Contributor

@jamesemery Those of us that work with thousands of samples thank you.
If you could change "GT" to "genotype" in the arg description then this can be good to go. (Technically the GT field is part of the genotype field, along with AD, DP and all our other friends. Obviously it would be silly to output those without GT, but I aim for semantic correctness.)

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete, back to @jamesemery. Looks good, minor comments only.

@@ -113,6 +113,10 @@
optional = true)
public boolean disableBamIndexCaching = false;

@Argument(fullName = StandardArgumentDefinitions.SITES_ONLY_LONG_NAME,
doc = "If true, don't emit GT fields when writing vcf file output.", optional = true)
public boolean sitesOnlyMode = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of field should make it clear that this applies to VCFs specifically -- eg., outputSitesOnlyVCFs instead of sitesOnlyMode

@@ -728,6 +732,10 @@ public VariantContextWriter createVCFWriter(final File outFile) {
}
}

if (sitesOnlyMode) {
options.add(Options.DO_NOT_WRITE_GENOTYPES);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check to see whether there are any other GATKTools that bypass GATKTool.createVCFWriter() when writing a VCF and call GATKVariantContextUtils.createVCFWriter() directly instead? (besides the ones you dealt with below)

Don't worry about non-GATKTools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already checked. It is bypassed in a number of tests, the HaplotypeCaller, and CreateSomaticPanelOfNormals which doesn't extend GATKTool so I elected not to include it here.

Utils.nonNull(outputVCF);
Utils.nonNull(readsDictionary);

List<Options> options = new ArrayList<>(2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

@@ -83,6 +92,25 @@ public void testFeaturesAsIntervalsUnrecognizedFormatFile() throws IOException {
testSpec.executeTest("testFeaturesAsIntervals", this);
}

@Test
public void testSitesOnlyMode() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to a new GATKToolIntegrationTest class instead? I regret my original suggestion of FeatureSupportIntegrationTest

File out = createTempFile("GTStrippedOutput", "vcf");
String[] args = new String[] {
"-V", FEATURE_INTEGRATION_TEST_DIRECTORY + "vcf_with_genotypes.vcf",
"--" + StandardArgumentDefinitions.SITES_ONLY_LONG_NAME,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this test fails if you remove the --sites-only arg temporarily?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked, it fails without --sites-only

Assert.assertFalse(v.hasGenotypes());
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a similar test to HaplotypeCallerIntegrationTest to prove that it honors --sites-only, since that is a special case?

@droazen droazen assigned jamesemery and unassigned droazen May 11, 2018
@codecov-io
Copy link

codecov-io commented May 11, 2018

Codecov Report

Merging #4764 into master will increase coverage by 0.01%.
The diff coverage is 88.889%.

@@             Coverage Diff              @@
##             master    #4764      +/-   ##
============================================
+ Coverage     80.07%   80.08%   +0.01%     
- Complexity    17420    17571     +151     
============================================
  Files          1080     1081       +1     
  Lines         63131    63815     +684     
  Branches      10200    10400     +200     
============================================
+ Hits          50549    51103     +554     
- Misses         8590     8702     +112     
- Partials       3992     4010      +18
Impacted Files Coverage Δ Complexity Δ
...ellbender/cmdline/StandardArgumentDefinitions.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...ellbender/utils/test/CommandLineProgramTester.java 90.909% <0%> (-4.329%) 9 <0> (ø)
...walkers/haplotypecaller/HaplotypeCallerEngine.java 78.967% <100%> (+0.315%) 67 <0> (+1) ⬆️
...org/broadinstitute/hellbender/engine/GATKTool.java 91.429% <100%> (+0.124%) 93 <0> (ø) ⬇️
...tools/walkers/haplotypecaller/HaplotypeCaller.java 93.548% <100%> (ø) 16 <0> (ø) ⬇️
...r/utils/read/markduplicates/sparkrecords/Pair.java 91.603% <0%> (-0.532%) 38% <0%> (+16%)
...number/gcnv/GermlineCNVSegmentVariantComposer.java 100% <0%> (ø) 10% <0%> (+4%) ⬆️
...ments/GermlineCNVHybridADVIArgumentCollection.java 100% <0%> (ø) 4% <0%> (+1%) ⬆️
.../markduplicates/EstimateLibraryComplexityGATK.java 0% <0%> (ø) 0% <0%> (ø) ⬇️
...mlineContigPloidyHybridADVIArgumentCollection.java 100% <0%> (ø) 4% <0%> (+1%) ⬆️
... and 14 more

@jamesemery
Copy link
Collaborator Author

@droazen Responded to your comments

@jamesemery jamesemery assigned droazen and unassigned jamesemery May 14, 2018
Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second-pass review complete, back to @jamesemery

@@ -0,0 +1,37 @@
package org.broadinstitute.hellbender;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should live in the same package as GATKTool (org.broadinstitute.hellbender.engine)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -322,17 +322,20 @@ private void initializeActiveRegionEvaluationGenotyperEngine() {
* @return a VCF or GVCF writer as appropriate, ready to use
*/
public VariantContextWriter makeVCFWriter( final String outputVCF, final SAMSequenceDictionary readsDictionary,
final boolean createOutputVariantIndex, final boolean createOutputVariantMD5 ) {
final boolean createOutputVariantIndex, final boolean sitesOnlyMode,
final boolean createOutputVariantMD5 ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be slightly more logical if the ordering of the boolean args were: createOutputVariantIndex, createOutputVariantMD5, sitesOnlyMode instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"--" + StandardArgumentDefinitions.SITES_ONLY_LONG_NAME,
"-O",
out.getAbsolutePath()};
new Main().instanceMain(makeCommandLineArgs(Arrays.asList(args), SelectVariants.class.getSimpleName()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a new overload of runCommandLine() to CommandLineProgramTest that takes 2 arguments: a List of args, and an explicit tool name, and call that new overload here? Should also add it to the CommandLineProgramTester interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.broadinstitute.hellbender.CommandLineProgramTest;
import org.broadinstitute.hellbender.Main;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove stray imports from this class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doen

@@ -71,6 +73,30 @@ public void testVCFModeIsConsistentWithPastResults(final String inputFileName, f
IntegrationTestSpec.assertEqualTextFiles(output, expected);
}

@Test
public void testSitesOnlyMode() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this test after the standard testVCFMode* and testGVCFMode* tests, rather than sticking it in the middle of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also call Utils.resetRandomGenerator() at the beginning of the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final String[] args = {
"-I", NA12878_20_21_WGS_bam,
"-R", b37_reference_20_21,
"-L", "20:10000000-10100000",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unnecessarily large interval for this test. How about 10k bases instead of 100k?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -37,6 +37,7 @@ private StandardArgumentDefinitions(){}
public static final String ANNOTATIONS_TO_EXCLUDE_LONG_NAME = "annotations-to-exclude";
public static final String SAMPLE_NAME_LONG_NAME = "sample-name";
public static final String PEDIGREE_FILE_LONG_NAME = "pedigree";
public static final String SITES_ONLY_LONG_NAME = "sites-only";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename to the more explicit sites-only-vcf-output?

@jamesemery
Copy link
Collaborator Author

@droazen Responded to comments

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Will merge once tests pass

@droazen droazen merged commit 4e3affb into master May 17, 2018
@droazen droazen deleted the je_sitesOnlyOption branch May 17, 2018 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants