Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes dropped AS_QD bug #6168

Merged
merged 11 commits into from
Oct 3, 2019
Merged

Fixes dropped AS_QD bug #6168

merged 11 commits into from
Oct 3, 2019

Conversation

ldgauthier
Copy link
Contributor

Addresses #6157

(Very minor tests are failing, which I will fix ASAP.)

doesn't have raw annotations; anyway it should get AS_QD
@droazen droazen self-assigned this Sep 20, 2019
// //Tests for Allele-Specific Annotations
// {new File(ALLELE_SPECIFIC_DIRECTORY, "NA12878.AS.chr20snippet.g.vcf"), getTestFile( "AS_Annotations.gatk3.7_30_ga4f720357.expected.vcf"), Arrays.asList( "-A", "ClippingRankSumTest", "-G", "AS_StandardAnnotation", "-G", "StandardAnnotation"), b37_reference_20_21},
// {new File(ALLELE_SPECIFIC_DIRECTORY, "NA12878.AS.chr20snippet.g.vcf"), getTestFile( "AS_Annotations.gatk3.7_30_ga4f720357.expected.vcf"), Arrays.asList( "-A", "ClippingRankSumTest", "-G", "AS_StandardAnnotation", "-G", "StandardAnnotation"), b37_reference_20_21},
// {new File(ALLELE_SPECIFIC_DIRECTORY, "NA12878.AS.chr20snippet.g.vcf"), getTestFile( "AS_Annotations.keepRawCombined.expected.vcf"), Arrays.asList( "-A", "ClippingRankSumTest", "-G", "AS_StandardAnnotation", "-G", "StandardAnnotation", "-keep-combined"), b37_reference_20_21},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I review this branch @ldgauthier, can you explain why you commented out the majority of the GenotypeGVCFs test cases?

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my comments @ldgauthier -- back to you

@@ -94,7 +101,7 @@
final ReducibleAnnotationData<List<Long>> myData = new ReducibleAnnotationData<>(null);
calculateRawData(vc, likelihoods, myData);
final String annotationString = makeRawAnnotationString(vc.getAlleles(), myData.getAttributeMap());
annotations.put(getRawKeyName(), annotationString);
annotations.put(getRawKeyNames().get(0), annotationString);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you should write a method that searches the raw key names for the index of the key you want instead of assuming it's the first item. At the very least, declare a named constant for 0 and add some comments explaining your assumptions.

Or, here's another thought: if you can only ever have either 1 or 2 raw keys in this brave new world, maybe you should implement named methods for each of the two keys -- something like getPrimaryRawKeyName() and getSecondaryRawKeyName() (or some more meaningful method names), instead of relying on indices. When you want them both together you can still call getRawKeyNames().

@droazen droazen assigned ldgauthier and unassigned droazen Sep 23, 2019
@ldgauthier
Copy link
Contributor Author

@droazen tests pass -- look good enough? (My other PR merged this branch, so this should go in first.)

@ldgauthier ldgauthier assigned droazen and unassigned ldgauthier Sep 25, 2019
Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @ldgauthier with a few final comments/questions

annotationMap.remove(currentASannotation.getRawKeyName());
for (final String rawKey : currentASannotation.getRawKeyNames()) {
//here we're assuming that each annotation combines data corresponding to its primary raw key
//AS_QD relies on two keys but doesn't have a combine operation because either GenomicsDB combines the keys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a combine operation to AS_QD in the Evoquer branch (https://github.com/broadinstitute/gatk/pull/6011/files#diff-026071e8dae8a8d22d918a4bf65423da), which will likely be merged fairly soon. Will we run into problems in that branch due to any assumptions being made here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the primary raw key for AS_QUAL is the one you have -- AS_QUALapprox

Having two keys for that one is odd, but I wanted to use a different key when we're approximating rather than when we're using David B.'s full EM. What do you think about using the same key with an additional flag annotation if we're approximating?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if we're allowing multiple raw keys in general, the AS_QUAL/AS_QUALapprox thing seems reasonable to me. Having to consult a separate annotation to determine how to interpret the AS_QUAL seems worse/more cumbersome.

@@ -43,7 +42,7 @@
}

@Override
public String getRawKeyName() { return GATKVCFConstants.AS_RAW_BASE_QUAL_RANK_SUM_KEY;}
public List<String> getRawKeyNames() { return Arrays.asList(GATKVCFConstants.AS_RAW_BASE_QUAL_RANK_SUM_KEY);}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexing scheme you added in the most recent revision is an improvement, but it's a bit brittle, since getRawKeyNames() doesn't directly call into getPrimaryRawKeyIndex() when constructing its list. In theory, someone could modify one method but not the other...

I think you should either:

  1. Come up with a scheme where getRawKeyNames() can call directly into the method(s) defining key order
    (best solution, but a little work)

or:

  1. Write a unit test for every single getRawKeyNames() method that asserts that you get the expected keys in the expected order, and also for the getPrimaryRawKeyIndex() methods asserting that you always get 0.

@@ -18,7 +18,8 @@
*
*/
public interface ReducibleAnnotation extends Annotation {
public abstract String getRawKeyName();
public abstract List<String> getRawKeyNames();
public abstract int getPrimaryRawKeyIndex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc here explaining these new interface methods.

* @param actual Variant context to test for equality
* @param expected Expected result
* @param attributesToIgnore Attributes we want to exclude from comparision
* @param attributesToIgnore Attributes we want to exclude from numerical comparision, but ensure they exist
* @param attributesWithJitter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comment here is right -- your code above ensures that attributesWithJitter exist, but doesn't check the existence of the attributesToIgnore

@droazen
Copy link
Collaborator

droazen commented Oct 3, 2019

@ldgauthier Travis finally ran, but unfortunately there were multiple test failures :(

@ldgauthier
Copy link
Contributor Author

@droazen Travis is green! Ready to go?

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Latest version looks good, and tests pass. Merging!

You should probably eventually push up a default implementation of getRawKeyNames() into ReducibleAnnotation that calls getPrimaryRawKey() followed by getSecondaryRawKeys() (so that each annotation doesn't have to worry about implementing this method correctly), but this can wait for another day.

@droazen droazen merged commit 406c69c into master Oct 3, 2019
@droazen droazen deleted the ldg_fixAS_QDfromGenotypes branch October 3, 2019 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants