-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tools, filters and annotations for mitochondria pipeline #5193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go over this with a fine-toothed comb, because I think there's some refactoring to be done before we worry about too many details. I'm leaning towards having separate tools for MT stuff, even if it's pretty much the same under the hood. We'll have the option for them to diverge and they can have more reasonable args.
String OAValue; | ||
if(!read.isUnmapped()){ | ||
OAValue = String.format("%s,%s,%s,%s,%s,%s;", | ||
read.getContig().replace(",", "_"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hg38 has lots of crazy names -- is comma the only parsing problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point of this replacement is just because ,
is the delimiter within the OA
tag. I replaced the ,
with OA_SEPARATOR
to be a little clearer. The ,
is also the only explicitly forbidden character from the RNAME
section of OA
tag in the PR in hts-specs.
|
||
final double[] tumorLods = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(vc, GATKVCFConstants.TUMOR_LOD_KEY, () -> null, -1); | ||
if (tumorLods==null) { | ||
warning.warn("One or more variant contexts is missing the 'TLOD' annotation, StrandArtifact will not be computed for these VariantContexts"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you copied this from StrandArtifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, oops
@DocumentedFeature(groupName= HelpConstants.DOC_CAT_ANNOTATORS, groupSummary=HelpConstants.DOC_CAT_ANNOTATORS_SUMMARY, summary="Number of alt reads with an OA tag that doesn't match the current alignment contig.") | ||
public class OriginalAlignment extends GenotypeAnnotation implements Annotation { | ||
protected final OneShotLogger warning = new OneShotLogger(this.getClass()); | ||
public static final String OA_NOT_CURRENT_CONTIG = "OA_NOT_CURRENT_CONTIG"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this key, but I don't have any better suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ORIGINAL_CONTIG_MISMATCH
any better?
src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/OriginalAlignment.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/OriginalAlignment.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect2.java
Show resolved
Hide resolved
return GATKProtectedVariantContextUtils.getAttributeAsIntArray(vc.getGenotype(tumorSample), key, () -> null, 0); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(extra whitespace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect2.java
Show resolved
Hide resolved
programGroup = VariantFilteringProgramGroup.class | ||
) | ||
@DocumentedFeature | ||
public final class FilterMitochondrialMutectCalls extends VariantWalker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we spin off a new walker for MT calls then this can have a more straightforward name.
*/ | ||
@Argument(fullName = TLOD_BY_DEPTH, | ||
doc="TLOD by depth threshold for filtering variant", optional = true) | ||
public double tlod_by_depth = .005; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to be able to say what this number means. For example, there are a few hand wavy explanations running around for the magic 6.3 for Mutect2, like two reads with Q30 alts. That might be less intuitive for something that's odds-ratio-based rather than probability-based though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, yeah I'm not sure. We pretty much found this number empirically.
@ldgauthier This should be ready for more review. @jamesemery This now has all of the argument constructors/copying, if you want to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss subclassing various classes in person.
private static final long serialVersionUID = 1L; | ||
@Override public boolean test(final GATKRead read) { | ||
boolean accept = true; | ||
if (read.hasAttribute(AddOriginalAlignmentTags.OA_TAG_NAME) & read.hasAttribute(AddOriginalAlignmentTags.MATE_CONTIG_TAG_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& -> &&.
} | ||
} | ||
|
||
private static void addMateContigTag(GATKRead read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method can be a one-liner without compromising clarity:
read.setAttribute(MATE_CONTIG_TAG_NAME, !read.mateIsUnmapped() ? read.getMateContig() : "*");
} | ||
} | ||
|
||
private static void addMateContigTag(GATKRead read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make final: final GATKRead read
} | ||
|
||
//TODO: Once the OA tag is merged into the spec (https://github.com/samtools/hts-specs/pull/193) this should be moved to htsjdk | ||
private static void addOATag(GATKRead read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final GATKRead read
src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/OriginalAlignment.java
Show resolved
Hide resolved
import org.broadinstitute.barclay.argparser.Argument; | ||
import org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerArgumentCollection; | ||
|
||
public class MitochondrialCallerArgumentCollection extends AssemblyBasedCallerArgumentCollection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be a subclass of M2ArgumentCollection
, instead of adding a new constructor to M2ArgumentCollection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is it possible to override an argument? I don't see that anywhere in our code base and we don't want the MitochondrialCaller docs to talk about tumors all over the place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldgauthier Longer term I could change the Mutect docs to be more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to override @Argument
in a subclass, but @meganshand with some refactoring you could define an interface or base class with abstract methods that return values of interest, and then dynamically select the concrete implementation on a per-tool basis. GATK and Picard both use this to override doc strings or required attributes for command line args. See this and this for example.
tumorSample
would require special handling because for M2 its provided in the CLI and for Mito its derived from the input header.
import org.broadinstitute.barclay.argparser.Argument; | ||
|
||
|
||
public class MitochondrialFiltersArgumentCollection extends M2FiltersArgumentCollection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the way this class looks confirms my earlier suggestions to extend corresponding classes in Mutect - it's clear and clean because it's a subclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually maybe not. Now that I look again, this is going to give the MitochondrialFiltersArgumentCollection a tumor-segmentation
argument, which I don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right, and now that I look at it, M2FiltersArgumentCollection extends AssemblyBasedCallerArgumentCollection, which doesn't make sense to me because those shouldn't be used in the filtering step (eg bamout
).
But assuming that does make sense, does that mean I want the MitochondrialFiltersArgumentCollection to extend AssemblyBasedCallerArgumentCollection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look to me like FilterMutectCalls needs any of the AssemblyBasedCaller args either. There could be a FiltersArgumentCollection that's the parent of an M2 version and a mito version. I think that arg collection wouldn't inherit from anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's going to be a lot of duplication just because tumors and mitochondria need different defaults. Is there really no elegant way to do this?
@@ -224,4 +222,19 @@ public void closeTool() { | |||
m2Engine.shutdown(); | |||
} | |||
} | |||
|
|||
private Set<VCFHeaderLine> getMutect2VCFHeaderLines() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with this refactor. I think it's worth revisiting after the subclassing I suggested above.
|
||
if (MathUtils.arrayMax(tumorLods) < MTFAC.TUMOR_LOD_THRESHOLD) { | ||
filterResult.addFilter(GATKVCFConstants.TUMOR_LOD_FILTER_NAME); | ||
if(lodKey.equals(GATKVCFConstants.TUMOR_LOD_KEY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a code smell. I think Mitochondria pipeline should have its own filtering engine that inherits M2FilteringEngine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored, but still have this problem. It's only to get the correct filter name in the output. Is there a way to link the filter name with an annotation key?
@@ -352,7 +386,21 @@ public FilterResult calculateFilters(final M2FiltersArgumentCollection MTFAC, fi | |||
return filterResult; | |||
} | |||
|
|||
private int[] getIntArrayTumorField(final VariantContext vc, final String key) { | |||
public FilterResult calculateMitochondrialFilters(MitochondrialFiltersArgumentCollection MTFAC, VariantContext vc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the subclass this method would be just be an override of calcualteFilters()
I pulled out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments, finally.
* If original alignment and mate original alignment tags exist, filter reads that were originally chimeric (mates were on different contigs). | ||
*/ | ||
@DocumentedFeature(groupName=HelpConstants.DOC_CAT_READFILTERS, groupSummary = HelpConstants.DOC_CAT_READFILTERS_SUMMARY, summary = "Filters reads whose original alignment was chimeric.") | ||
public static class ChimericOAFilter extends ReadFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other ReadFilter
s this should be named ChimericOAReadFilter
.
public static class ChimericOAFilter extends ReadFilter { | ||
private static final long serialVersionUID = 1L; | ||
@Override public boolean test(final GATKRead read) { | ||
boolean accept = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify via:
if ( attributes are present) {
return AddOriginalAlignmentTags. . .
}
return true;
@@ -291,5 +307,6 @@ private ReadFilterLibrary(){ /*no instance*/ } | |||
public static final SeqIsStoredReadFilter SEQ_IS_STORED = new SeqIsStoredReadFilter(); | |||
public static final ValidAlignmentStartReadFilter VALID_ALIGNMENT_START = new ValidAlignmentStartReadFilter(); | |||
public static final ValidAlignmentEndReadFilter VALID_ALIGNMENT_END = new ValidAlignmentEndReadFilter(); | |||
public static final ChimericOAFilter CHIMERIC_OA_FILTER = new ChimericOAFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not abbreviate here.
* <p>If reads were realigned to multiple references (for example the full human reference followed by just the | ||
* mitochondria) and the original alignment tag was recorded before realigning, then we can count the number of alt | ||
* reads that have been realigned from other contigs to this one. This can be useful for downstream filtering if an alt | ||
* allele has all of most of its support originally from a different contig. In the mitochondria case this can be useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"all or most"
Utils.nonNull(vc); | ||
Utils.nonNull(likelihoods); | ||
|
||
final double[] lods = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(vc, GATKVCFConstants.LOD_KEY, () -> null, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code duplication from Mutect2FilteringEngine::getDoubleArrayAttribute
-- you can extract a shared method in GATKPRotectedVariantContextUtils
.
public boolean dontClipITRArtifacts = true; | ||
|
||
@Advanced | ||
@Argument(fullName = M2ArgumentCollection.GET_AF_FROM_AD_LONG_NAME, doc="Use allelic depth to calculate allele fraction", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the changes in @MartonKN's PR (which will get merged imminently) can you delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
import org.broadinstitute.barclay.argparser.Argument; | ||
|
||
|
||
public class MitochondrialFiltersArgumentCollection extends M2FiltersArgumentCollection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's going to be a lot of duplication just because tumors and mitochondria need different defaults. Is there really no elegant way to do this?
src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect2.java
Show resolved
Hide resolved
@@ -372,4 +365,20 @@ private void checkSampleInBamHeader(final String sample) { | |||
private String decodeSampleNameIfNecessary(final String name) { | |||
return samplesList.asListOfSamples().contains(name) ? name : IOUtils.urlDecode(name); | |||
} | |||
|
|||
public String getVersion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add these intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, would it be better to make the fields not private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking move getMutect2VCFHeaderLines
from Mutect2
to Mutect2Engine
and then these fields could remain private without needing to add the getters.
//Mitochondrial M2-related filters | ||
addFilterLine(new VCFFilterHeaderLine(CHIMERIC_ORIGINAL_ALIGNMENT_FILTER_NAME, "NuMT variant with too many ALT reads originally from autosome")); | ||
addFilterLine(new VCFFilterHeaderLine(LOW_AVG_ALT_QUALITY_FILTER_NAME, "Low average alt quality")); | ||
addFilterLine(new VCFFilterHeaderLine(LOW_LOD_FILTER_NAME, "Variant does not meet likelihood threshold")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could unify some of the mito and tumor filter and info fields by making the M2 names less tumor specific. If Lee and Geraldine are okay with it I have no problem with eg TLOD -> LOD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be helpful. Is that difficult for legacy reasons? (Will users complain about the change breaking their downstream scripts?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments @davidbenjamin! I fixed the easy things and I'm talking to comms now about the larger questions of if this should be a wrapper or not.
public boolean dontClipITRArtifacts = true; | ||
|
||
@Advanced | ||
@Argument(fullName = M2ArgumentCollection.GET_AF_FROM_AD_LONG_NAME, doc="Use allelic depth to calculate allele fraction", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
@@ -123,4 +123,23 @@ | |||
@Argument(fullName = SMITH_WATERMAN_LONG_NAME, doc = "Which Smith-Waterman implementation to use, generally FASTEST_AVAILABLE is the right choice", optional = true) | |||
public SmithWatermanAligner.Implementation smithWatermanImplementation = SmithWatermanAligner.Implementation.JAVA; | |||
|
|||
public AssemblyBasedCallerArgumentCollection(){} | |||
|
|||
//TODO: copy the arg collections themselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't. I believe @droazen and @jamesemery are aware that this todo is going in though. (If we end up going this wrapper command line tool route)
@@ -372,4 +365,20 @@ private void checkSampleInBamHeader(final String sample) { | |||
private String decodeSampleNameIfNecessary(final String name) { | |||
return samplesList.asListOfSamples().contains(name) ? name : IOUtils.urlDecode(name); | |||
} | |||
|
|||
public String getVersion() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, would it be better to make the fields not private?
//Mitochondrial M2-related filters | ||
addFilterLine(new VCFFilterHeaderLine(CHIMERIC_ORIGINAL_ALIGNMENT_FILTER_NAME, "NuMT variant with too many ALT reads originally from autosome")); | ||
addFilterLine(new VCFFilterHeaderLine(LOW_AVG_ALT_QUALITY_FILTER_NAME, "Low average alt quality")); | ||
addFilterLine(new VCFFilterHeaderLine(LOW_LOD_FILTER_NAME, "Variant does not meet likelihood threshold")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be helpful. Is that difficult for legacy reasons? (Will users complain about the change breaking their downstream scripts?)
Codecov Report
@@ Coverage Diff @@
## master #5193 +/- ##
==============================================
- Coverage 86.766% 85.625% -1.14%
+ Complexity 29944 29682 -262
==============================================
Files 1838 1839 +1
Lines 138820 138742 -78
Branches 15285 15272 -13
==============================================
- Hits 120448 118798 -1650
- Misses 12804 14441 +1637
+ Partials 5568 5503 -65
|
Hi, |
I defer to @meganshand regarding @SusieX's questions. |
I'm not sure when this will be merged, but I'm hoping within the next couple of weeks. In the meantime, the parameters we've been using with Mutect2 are: --initial-tumor-lod 0 The min-pruning argument is the most important. For FilterMutectCalls we turn off a bunch of filters, the only ones we use are: low-tlod I'd recommend manually inspecting your filters if possible. |
I talked to comms and we agreed that a "mitochondria-mode" argument to Mutect2 was the right balance of clarity (you're really running Mutect2 not a wrapper) and simplicity (you don't need a laundry list of arguments to change which mode you're in if you just want to run with optimized defaults). @ldgauthier @davidbenjamin @takutosato @rcmajovski Could you please take another look? Removing the wrapper tools has cleaned up the code so there are fewer changes now. I also changed TLOD to LOD in this version, but I'm happy to take that out and have that be future work if anyone is worried about it being a breaking change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in good shape. Consider my comments optional suggestions.
* If original alignment and mate original alignment tags exist, filter reads that were originally chimeric (mates were on different contigs). | ||
*/ | ||
@DocumentedFeature(groupName=HelpConstants.DOC_CAT_READFILTERS, groupSummary = HelpConstants.DOC_CAT_READFILTERS_SUMMARY, summary = "Filters reads whose original alignment was chimeric.") | ||
public static class NonChimericOaFilter extends ReadFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class won't be instantiated in very many places you can safely give it a verbose name like NonChimericOriginalAlignmentReadFilter
@@ -291,5 +306,6 @@ private ReadFilterLibrary(){ /*no instance*/ } | |||
public static final SeqIsStoredReadFilter SEQ_IS_STORED = new SeqIsStoredReadFilter(); | |||
public static final ValidAlignmentStartReadFilter VALID_ALIGNMENT_START = new ValidAlignmentStartReadFilter(); | |||
public static final ValidAlignmentEndReadFilter VALID_ALIGNMENT_END = new ValidAlignmentEndReadFilter(); | |||
public static final NonChimericOaFilter NON_CHIMERIC_OA_FILTER = new NonChimericOaFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too, I would not abbreviate.
protected final OneShotLogger warning = new OneShotLogger(this.getClass()); | ||
public static final String KEY = GATKVCFConstants.POTENTIAL_POLYMORPHIC_NUMT_KEY; | ||
private static final double LOWER_BOUND_PROB = .1; | ||
private long minAutosomalHet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these can all be final
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it looks like some of them don't need to be class members at all because they are local to the constructor.
@@ -81,7 +81,7 @@ public void annotate(final ReferenceContext ref, | |||
pi.put(ART_REV, PRIOR_PROBABILITY_OF_ARTIFACT); | |||
|
|||
// We use the allele with highest LOD score | |||
final double[] tumorLods = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(vc, GATKVCFConstants.TUMOR_LOD_KEY, () -> null, -1); | |||
final double[] tumorLods = GATKProtectedVariantContextUtils.getAttributeAsDoubleArray(vc, GATKVCFConstants.LOD_KEY, () -> null, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, you can change "highest LOD score" to "highest log odds."
private String getTumorSampleName(){ | ||
return getHeaderForVariants().getMetaDataLine(Mutect2Engine.TUMOR_SAMPLE_KEY_IN_VCF_HEADER).getValue(); | ||
private String getTumorSampleName() { | ||
if (!MTFAC.mitochondria) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a one-liner: return MTFAC.mitochondria ? getMitoSampleName() : getHeaderForVariants().getMetaDataLine(Mutect2Engine.TUMOR_SAMPLE_KEY_IN_VCF_HEADER).getValue();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, you have my 👍 to merge
@@ -168,7 +169,9 @@ public void writeHeader(final VariantContextWriter vcfWriter, final Set<VCFHeade | |||
headerInfo.add(GATKVCFHeaderLines.getFormatLine(GATKVCFConstants.HAPLOTYPE_CALLER_PHASING_ID_KEY)); | |||
headerInfo.add(GATKVCFHeaderLines.getFormatLine(GATKVCFConstants.HAPLOTYPE_CALLER_PHASING_GT_KEY)); | |||
|
|||
headerInfo.add(new VCFHeaderLine(TUMOR_SAMPLE_KEY_IN_VCF_HEADER, tumorSample)); | |||
if (!MTAC.mitochondria) { | |||
headerInfo.add(new VCFHeaderLine(TUMOR_SAMPLE_KEY_IN_VCF_HEADER, tumorSample)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you OK not adding the sample name to the header in mt mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the sample name is included in the last "header line" (the one with one #
), the same way it looks in a germline vcf.
* Only variants with LOD divided by depth exceeding this threshold can pass filtering. | ||
*/ | ||
@Argument(fullName = LOD_BY_DEPTH, | ||
doc="LOD by depth threshold for filtering variant", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline
* reads exceeding this threshold can pass filtering. | ||
*/ | ||
@Argument(fullName = NON_MT_ALT_READS_BY_ALT_READS, | ||
doc="Known NuMT alts by total alts threshold for filtering variant", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
* If original alignment and mate original alignment tags exist, filter reads that were originally chimeric (mates were on different contigs). | ||
*/ | ||
@DocumentedFeature(groupName=HelpConstants.DOC_CAT_READFILTERS, groupSummary = HelpConstants.DOC_CAT_READFILTERS_SUMMARY, summary = "Filters reads whose original alignment was chimeric.") | ||
public static class NonChimericOriginalAlignmentFilter extends ReadFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with David that this should be renamed NonChimericOriginalAlignmentReadFilter
to be consistent with others
@@ -136,6 +137,8 @@ | |||
public static final String ROF_POSTERIOR_KEY = "P_RO"; // For read orientation filter | |||
public static final String ROF_PRIOR_KEY = "P_PRIOR_RO"; | |||
public static final String ROF_TYPE_KEY = "ROF_TYPE"; | |||
public static final String ORIGINAL_CONTIG_MISMATCH_KEY = "ORIGINAL_CONTIG_MISMATCH"; | |||
public static final String POTENTIAL_POLYMORPHIC_NUMT_KEY = "POTENTIAL_POLYMORPHIC_NUMT"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces
private Range<Long> autosomalHomAltRange; | ||
|
||
public PolymorphicNuMT(final double medianAutosomalCoverage){ | ||
PoissonDistribution autosomalCoverage = new PoissonDistribution(medianAutosomalCoverage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
final Collection<ReadLikelihoods<Allele>.BestAllele> bestAlleles = likelihoods.bestAllelesBreakingTies(g.getSampleName()); | ||
final String currentContig = ref.getInterval().getContig(); | ||
|
||
long nonChrMAlt = bestAlleles.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
} | ||
private void applyLODFilter(final M2FiltersArgumentCollection MTFAC, final VariantContext vc, final FilterResult filterResult) { | ||
if(vc.isBiallelic()) { | ||
Double LOD = vc.getAttributeAsDouble(GATKVCFConstants.LOD_KEY, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOD -> lod. Also make them final
final int maxFractionIndex = MathUtils.maxElementIndex(alleleFractions); | ||
final int[] ADs = tumorGenotype.getAD(); | ||
final int altCount = ADs[maxFractionIndex + 1]; | ||
if (tumorGenotype.hasAnyAttribute(GATKVCFConstants.ORIGINAL_CONTIG_MISMATCH_KEY) & vc.isBiallelic()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& -> &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R has ruined me :/
final int[] ADs = tumorGenotype.getAD(); | ||
final int altCount = ADs[maxFractionIndex + 1]; | ||
if (tumorGenotype.hasAnyAttribute(GATKVCFConstants.ORIGINAL_CONTIG_MISMATCH_KEY) & vc.isBiallelic()) { | ||
int nonMtOa = Integer.parseInt(tumorGenotype.getAnyAttribute(GATKVCFConstants.ORIGINAL_CONTIG_MISMATCH_KEY).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use the util method since we have it
final int nonMtOa = GATKProtectedVariantContextUtils.getAttributeAsInt(tumorGenotype, GATKVCFConstants.ORIGINAL_CONTIG_MISMATCH_KEY, -1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I had these comments pending for days. Sorry. Minor changes and questions.
* Original Alignment annotation counts the number of alt reads where the original alignment contig doesn't match the current alignment contig | ||
* | ||
* <p>If reads were realigned to multiple references (for example the full human reference followed by just the | ||
* mitochondria) and the original alignment tag was recorded before realigning, then we can count the number of alt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mitochondrial -> mitochondrial contig
* <p>If reads were realigned to multiple references (for example the full human reference followed by just the | ||
* mitochondria) and the original alignment tag was recorded before realigning, then we can count the number of alt | ||
* reads that have been realigned from other contigs to this one. This can be useful for downstream filtering if an alt | ||
* allele has all or most of its support originally from a different contig. In the mitochondria case this can be useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mitochondrial -> mitochondrial
src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/OriginalAlignment.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
38f384f
to
b158b82
Compare
Could anyone please give an announcement when the mitochondria pipeline is officially available? Thanks |
@SusieX Sorry for the delay, we're just trying to finalize some potentially disruptive changes before we merge. I'll open an issue so I can ping you there once this PR has been both merged and released. Thanks! |
Thank you!
…On Tue, Oct 16, 2018 at 9:29 AM meganshand ***@***.***> wrote:
@SusieX <https://github.com/SusieX> Sorry for the delay, we're just
trying to finalize some potentially disruptive changes before we merge.
I'll open an issue so I can ping you there once this PR has been both
merged and released. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ4DHqWjVP9RQyGuRbbM-Nds_8l0RDtSks5uld84gaJpZM4WqV76>
.
|
618c23a
to
a4517ac
Compare
a4517ac
to
9c8986d
Compare
Hi Megan,
Thanks a lot for your help!
|
Hi @SusieX For your questions:
|
Thank you!
…On Thu, Oct 18, 2018 at 4:45 PM meganshand ***@***.***> wrote:
Hi @SusieX <https://github.com/SusieX>
For your questions:
1.
Yes you want --min-pruning to be different for each sample, even if
you're eventually going to compare samples. It will increase your
sensitivity.
2.
No need for -ploidy 1 or --do-not-run-physical-phasing
3.
FilterMutectCalls should be used on the output vcf from Mutect2
4.
There is no way to joint call these VCFs the way you do with GVCFs as
of today. This is future work that is on our to do list, but I don't know
when it will be done.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ4DHlYex4IQEBecwgRxwyNIFXymsGIYks5umOhxgaJpZM4WqV76>
.
|
Hi I'm very new to GATK and was looking into using HaplotypeCaller to generate a (g)vcf file with both variant and invariant sites for chloroplast and mitochondrial DNA (with the --emitRefConfidence BP_resolution flag). I found this thread and have looked up the parameters sent to SusieX. But I'm confused what should be used as the reference/normal genome? It would be great if there was an option to "run in mitochondria mode". |
Hi Megan, Are remove-duplicate and base-quality-recalibration recommended before mutect2 call? For 1st step of base quality recalibration, what reference file to use for --known-sites? My reads are mapped to hg38. gatk BaseRecalibrator So the whole pipeline would be: Am I missing anything? Thanks a lot! |
Marking duplicates: I do recommend removing duplicates (we run MarkDuplicates from Picard). BQSR: The pipeline we're developing is for Whole Genome data, so our bams have gone through BQSR in the whole genome pipeline. We're using those recalibrated base qualities. I haven't tested running BQSR only on the mitochondria so I don't know how well that would work. If you do need to run BQSR only on the mitochondria I'd start by using the phylotree sites as If you end up using BQSR I think you're pipeline (BAM -> remove dup -> BQ recalibrate -> Mutect2 call -> FilterMutectCalls) is correct. Good luck! |
Thank you!
…On Mon, Oct 22, 2018 at 10:33 AM meganshand ***@***.***> wrote:
@SusieX <https://github.com/SusieX>
Marking duplicates: I do recommend removing duplicates (we run
MarkDuplicates from Picard).
BQSR: The pipeline we're developing is for Whole Genome data, so our bams
have gone through BQSR in the whole genome pipeline. We're using those
recalibrated base qualities. I haven't tested running BQSR only on the
mitochondria so I don't know how well that would work.
If you do need to run BQSR only on the mitochondria I'd start by using the
phylotree sites as --known-sites, but you'd need to have those sites in
vcf format. Again, I haven't tested this so I don't know how well it will
perform.
If you end up using BQSR I think you're pipeline (BAM -> remove dup -> BQ
recalibrate -> Mutect2 call -> FilterMutectCalls) is correct. Good luck!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ4DHh9X94wZ-4488FohFKnkv1SCe6c2ks5undccgaJpZM4WqV76>
.
|
Hi I have a few questions about using Mutect2 versus HaplotypeCaller to call mitochondrial (or chloroplast) variants (I'm working on plants).
In my case, there isn't a "normal" sample to compare the "tumour" (i.e. mitochondria) to, just a reference. Are my results likely to be prone to false positives then? Or is it that the case for mitochondria is different because we are not truly looking for somatic variants but rather variants in general that may not be 100% "pure" (because of heteroplasty?). Is the latter point the justification for Mutect vs HaplotypeCaller in the first place?
In HaplotypeCaller, I read that I could use --emit-ref-confidence BP_RESOLUTION to get the confidence of a site being homozygous reference. Does --out-mode EMIT_ALL_CONFIDENT_SITES give similar information?
Many thanks! |
|
Thanks for this clarification! |
Hello, |
Hey, in the light of recent papers describing mtDNA as a great natural barcode to do lineage tracing in single human cells from scATAC-seq data, I think this is of relevance and I'd be very interested if there is an update on the pipeline as well. Thanks!! |
There is a copy of the 4.1.0.0 pipeline here: https://app.terra.bio/#workspaces/help-gatk/Mitochondria-SNPs-Indels-hg38 We are working on some adjustments to this pipeline to make the low allele fraction calls more reliable with more aggressive filtering and we will put those updates in that same location once they're complete. |
Hi, I'm trying to generate a VCF with Mitochondrial mode of Mutect2 based on MT amplicon (PCR) sequencing. But I encountered two problems:
version: GATK4.1.4.1 MT 16182 . A AC,ACC . . DP=262;ECNT=7;MBQ=31,26,29;MFRL=0,0,0;MMQ=60,60,60;MPOS=44,44;OCM=0;POPAF=2.40,2.40;TLOD=84.81,46.04 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:157,72,31:0.261,0.118:260:122,48,21:0,0,0:0,157,0,103 |
The workflow for mitochondria will include running
AddOriginalAlignmentTags
on a bam, realigning it to the mitochondria contig only, then runningMutect2
with--annotation OriginalAlignment
and--median-autosomal-coverage
to get the appropriate annotations. Then runningFilterMitochondrialMutectCalls
.I don't think any of these changes should effect the somatic pipeline.
@ldgauthier I didn't change the name of
TLOD
ortumor sample
in the mitochondria vcf. Maybe we can talk next week about how to best do that?